Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 04 21:06
    jadecacho commented #518
  • Dec 04 20:50
    jadecacho commented #518
  • Dec 03 16:27

    lewfish on master

    Use pytorch docker image and mo… Merge pull request #863 from mb… (compare)

  • Dec 03 16:27
    lewfish closed #863
  • Dec 03 16:27
    lewfish commented #863
  • Dec 03 02:04
    mbertrand synchronize #863
  • Dec 03 02:04
    mbertrand edited #863
  • Dec 03 02:04
    mbertrand edited #863
  • Dec 03 02:04
    mbertrand edited #863
  • Dec 03 02:04
    mbertrand opened #863
  • Dec 02 22:20
    lewfish commented #862
  • Nov 28 03:00
    aaroncnb opened #862
  • Nov 26 10:41
    lmbak commented #855
  • Nov 26 10:41
    lmbak commented #855
  • Nov 26 10:36
    lmbak commented #855
  • Nov 26 10:35
    lmbak commented #855
  • Nov 26 10:33
    lmbak commented #855
  • Nov 26 10:26
    lmbak commented #855
  • Nov 25 10:45
    lmbak synchronize #861
  • Nov 25 09:23
    lmbak commented #861
Jim Huang
@cequencer_gitlab

I backed out of the brew install gdal and went with conda install gdal which solved the issue in much better runtime environment and management. Now I no longer have any dependency issue on the tiny experiment example.

However, I am running into a permission issue where the code wants to create a directory called /opt/data despite the fact that I set that directory some where else via with_root_uri.

Traceback (most recent call last):
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/__main__.py", line 17, in <module>
    rv.main()
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/cli/main.py", line 294, in run_command
    rv.runner.CommandRunner.run(command_config_uri)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/runner/command_runner.py", line 11, in run
    CommandRunner.run_from_proto(msg)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/runner/command_runner.py", line 17, in run_from_proto
    command.run()
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/command/train_command.py", line 21, in run
    task.train(tmp_dir)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/task/task.py", line 137, in train
    self.backend.train(tmp_dir)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/backend/pytorch_semantic_segmentation.py", line 228, in train
    self.train_opts.model_arch, num_labels, pretrained=True)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/rastervision/backend/torch_utils/semantic_segmentation/model.py", line 6, in get_model
    'deeplabv3', 'resnet50', num_labels, False, pretrained_backbone=True)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/torchvision/models/segmentation/segmentation.py", line 22, in _segm_resnet
    replace_stride_with_dilation=[False, True, True])
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/torchvision/models/resnet.py", line 255, in resnet50
    **kwargs)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/torchvision/models/resnet.py", line 217, in _resnet
    progress=progress)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/site-packages/torch/hub.py", line 470, in load_state_dict_from_url
    os.makedirs(model_dir)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "//anaconda3/envs/colab_jupy_raster/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/opt/data'
make: *** [2] Error 1
2019-10-21 09:35:22:rastervision.utils.misc: DEBUG - Terminating 66156...
I grep for all of the files (source and generated) no where I have defined or used /opt/data. Any clue where it is statically coded?
Or there is some environment variable I need to override?
Jim Huang
@cequencer_gitlab
ah. Thanks for the pointer. I will go ahead and create this subdirectory on my local machine.
Lewis Fishgold
@lewfish
It sounds like there are some permissions issues on the instance you're on that won't allow /opt/data to be created.
Jim Huang
@cequencer_gitlab
I am not using the docker image. I am installing all the dependencies via conda and created conda virtual environment to manage the runtime environment and dependencies for this.
Lewis Fishgold
@lewfish
Ok, does it work after manually creating the directory?
Jim Huang
@cequencer_gitlab
it complained I didn't install absl, so I just installed it via conda install absl-py.
I tried to rerun the entire pipeline and it looks I need to rm -rf ..... to force a rebuild of this pipeline. Is there a nicer flag to use in this situation?
I just set the with_root_uri to a new data directory so we can rebuild the entire pipeline with all the dependencies now in place.
Jim Huang
@cequencer_gitlab
it's doing training now.
Jim Huang
@cequencer_gitlab
Training  [####################################]  100%
2019-10-21 10:09:50:rastervision.backend.pytorch_semantic_segmentation: INFO - train loss: 0.2515265792608261
Validating  [####################################]  100%
2019-10-21 10:10:42:rastervision.backend.pytorch_semantic_segmentation: INFO - validation metrics: {'precision': 0.8326506614685059, 'recall': 0.8441998958587646, 'f1': 0.8383855223655701}
2019-10-21 10:10:42:rastervision.backend.pytorch_semantic_segmentation: INFO - epoch elapsed time: 0:04:03.569688
No statistics for prediction step.
Running evaluator: SemanticSegmentationEvaluator...
2019-10-21 10:11:05:rastervision.evaluation.semantic_segmentation_evaluator: INFO - Computing evaluation for scene val_scene...
2019-10-21 10:11:05:rastervision.evaluation.semantic_segmentation_evaluation: DEBUG - Evaluating window: [   0.    0. 1000. 1000.]
2019-10-21 10:11:05:rastervision.data.raster_source.rasterized_source: DEBUG - Rasterizing window: [   0.    0. 1000. 1000.]
2019-10-21 10:11:05:rastervision.data.raster_source.rasterized_source: DEBUG - Cropping shapes to window...
2019-10-21 10:11:05:rastervision.data.raster_source.rasterized_source: DEBUG - # of shapes in window: 17
2019-10-21 10:11:05:rastervision.data.raster_source.rasterized_source: DEBUG - rasterio.rasterize()...
2019-10-21 10:11:08:rastervision.utils.misc: DEBUG - Terminating 67663...
Is this the expected result?
Lewis Fishgold
@lewfish
yes, the eval metrics will be in the generated eval directory.
Jim Huang
@cequencer_gitlab
cool. I saw the json file under the eval directory. Everything looks good!
Lewis Fishgold
@lewfish
nice!
Jim Huang
@cequencer_gitlab
now, I have a working example on my local environment to learn how to use this framework correctly and efficiently. The Quickstart guide is an important hello-world to make sure the dependency and runtime environment are solid. Thank you all for the help!
Lewis Fishgold
@lewfish
you're welcome!
Jim Huang
@cequencer_gitlab
I see the Quickstart example showed AWS as an option for runtime. I currently have Azure access, is it difficult to get this code base to work on Azure?
Jim Huang
@cequencer_gitlab
Is it a lot of effort? Is there some reference documentation that potentially highlights the level of effort and work involved?
Rob Emanuele
@lossyrob
@cequencer_gitlab it would take writing a new ExperimentRunner. Which is medium-to-hard level of effort, depending on how comfortable you are with the codebase, and also how easy it is to get an AWS-Batch-like service to run things on Azure
There's not too much code there, which is more about how easy it is to use AWS Batch than other things. There's also some configuration that's set up outside of that code, and having to deploy the AWS resources in the first place - which happens through https://github.com/azavea/raster-vision-aws
However, writing one for Kubernetes that would run on Azure's Kube setup (I think they have one, right?), might be the way to go. I'd be interested in helping you think this through if you would be interested in taking it on
Lewis Fishgold
@lewfish
@cequencer_gitlab Just to clarify something: the AWS experiment runner has the ability to submit jobs within an experiment to an AWS Batch queue. There are some advantages to using this (like picking CPU vs. GPU instances) and starting and stopping instances automatically, but it's also possible to just log into a cloud instance (AWS, Azure, or whatever), get the Docker image off Quay, and then use the local experiment runner on that instance. That approach wouldn't involve extending the experiment runner.
Jim Huang
@cequencer_gitlab

@lewfish & @lossyrob Thank you for the ping back and providing a myriad of ways to go about running raster-vision on Azure. (I was out on a trip for a few days and I am back this week.).

I am not familiar with the AWS eco-system and I am relative new to Azure service, so I do not have a 1-to-1 mapping of “AWS Batch queue maps to Azure ???” would be. But it sounds like this approach is the easist to try and shortest effort?

I am unclear if there are 2 or 3 approaches when I tried to group both of your feedback. Here is my attempt to simplify, please do correct me if I am wrong:

  1. Perform the equivalent conversion of “AWS Batch queue” on “Azure equivalent”
  2. Build a pipeline using Docker container and services on Azure
  3. Build a kubernetes pipeline using AKS (Azure Kubernetes Services)

I do have some experiences in Kubernetes (K8S) deployments. Does raster-vision have a kubernetes operator code that is usable to deploy a raster-vision K8S deployment? If not, what is the current state on how raster-vision is working within a K8S stack?

Is this the right forum to continue this discussion?

Lewis Fishgold
@lewfish
@cequencer_gitlab I think the easiest to try is just start an Azure instance, ssh into it, and run RV locally there. I think this is what you meant by option #2 (and is what I was saying in my previous message). Then, if RV is providing some value, and you want to put in the effort to write your own ExperimentRunner that uses Azure and the code to setup the Azure resources (option #1 or #3) you could try that. RV doesn't have any Kubernetes-related functionality -- you would have to add an ExperimentRunner that uses it.
This is the right venue
I don't have any experience with Azure, but I found this page for Azure Batch which seems similar to AWS Batch. https://azure.microsoft.com/en-us/services/batch/
Jim Huang
@cequencer_gitlab
Thank you @lewfish for the helpful feedback and reference. I am going to try out the path you have recommended so I can gain more familiarity about Azure. I will ping back here if I am stuck in some parts or need some quick triage to spend my limited time wisely.
lmbak
@lmbak
@cequencer_gitlab I'm actually doing what Lewis is proposing above. Just ssh into Azure and run locally. Works like a charm.
David Ketchum
@dgketchum
Hello @lewfish . I've put in an issue re: instance segmentation. azavea/raster-vision#854
Lewis Fishgold
@lewfish
Hello @dgketchum . I responded to your issue.
Christos
@Charmatzis
Hello guys, I was wondering if there is a article on how to take the next step and use in production the model that has been created using Raster-Vision and develop a prediction service API. Let's say a scenario like:
  1. take the zip predict package
  2. deployed in AWS S3
  3. Create serverless functions using this package
  4. Develop the right API
lmbak
@lmbak
How exactly can I run multiple experiments of a single dataset without having seperate chips made for each experiment? The Directed Acyclic Graph looks at the input and output of a command, right? So having two experiments with a different experiment_id would be seen as different input/output as for each experiment a different folder is created, and have the same chips be made for each experiment, right (so double work done)? I am using the same dataset object when configuring the .with_dataset() method of the experiment, but that seems to result in a different chipping process being started for each experiment anyway?
Christos
@Charmatzis
@lmbak I think a work around is to modify the config files (json) to point to the same directory
Lewis Fishgold
@lewfish
@Charmatzis We don't have an article on this. Assuming you know how to setup an API using something like Flask you would just use the Predictor class to load the model and make a prediction over a GeoTIFF file. https://docs.rastervision.io/en/latest/api.html#predictor
Lewis Fishgold
@lewfish
@lmbak This is an example of a set of experiments with a branching structure, where the chips are shared between the experiments. https://github.com/azavea/raster-vision-examples#spacenet-vegas-hyperparameter-search
The key thing is to use the same root_uri and chip_key across the experiments.
lmbak
@lmbak
@lewfish Great, thanks. That was the piece of missing information I did not have. I'll have a look!
Christos
@Charmatzis
@lewfish thx, I was thinking to create an Aws lambda service, but I am skeptical about: 1. The requirements that raster-vision needs to be installed 2. The processing for large images
I am focusing more on using Aws batch as a service called by a lambda. What do you think?
Lewis Fishgold
@lewfish
@Charmatzis So something triggers the lambda function which takes a reference to a GeoTIFF, and that kicks off a Batch job which calls the Predictor? That seems to make sense. I think you're right that it would be hard to create a lambda function with all the crazy dependencies that RV has. Also, the jobs can take too long for a lamba fn.
Christos
@Charmatzis
@lewfish yes, I was thinking the same. Right now I am trying to design this system. If you have time maybe we can arrange a call for more details on that...
David Ketchum
@dgketchum
This message was deleted
David Ketchum
@dgketchum

Transfer learning question: In my instance segmentation RV fork I'm trying to use the pretrained torchvision mask rcnn COCO model with the resnet-50 fpn backbone to re-train the 'head' of the model on the 1 m RGB of NAIP. I'm unsure exactly what parameters to freeze (requires_grad_(False)) and what to train.

In RV, the object detection code rastervision/backend/torch_utils/model.py freezes (requires_grad_(False)) where 'layer2' not in name and 'layer3' not in name and 'layer4' not in name, the first layer. Have RV developers found this to be more effective for transfer learning compared to freezing more of the model? Do you have experience with varying the 'depth' of transfer based on the size of the training data or some other constraint? I'd love to hear the reasoning behind this decision. Very fascinating; excuse me if something obvious went over my head. See my experiment https://github.com/dgketchum/raster-vision/blob/instance_seg/rastervision/inseg.py, and what I'm freezing, https://github.com/dgketchum/raster-vision/blob/instance_seg/rastervision/backend/torch_utils/instance_segmentation/model.py

Lewis Fishgold
@lewfish
@dgketchum I am pretty confident that you will want to fine-tune more of the model than just the head, especially since satellite imagery is quite different than COCO-like datasets. We always stick with the simple approach of fine-tuning the entire model using the same learning rate for all layers. We use a learning rate that is 1/10 the default value provided by the library when fine-tuning. This provides a big advantage over just training from scratch. However, some people have reported a small benefit from using differential learning rates (ie. higher rates in layers closer to the head), or just freezing the first few layers. They discuss this at some point in the fastai course