Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 21 21:29

    lewfish on fastai-backends

    Add fastai backends Style fixes Add deps for fastai (compare)

  • Aug 21 17:57
    lewfish synchronize #812
  • Aug 21 17:57

    lewfish on zxy-source

    Fix problems with zxy raster so… (compare)

  • Aug 21 17:56
    lewfish edited #813
  • Aug 21 17:56

    lewfish on master

    Add fastai, pytorch, and upgrad… Merge pull request #813 from az… (compare)

  • Aug 21 17:56
    lewfish closed #813
  • Aug 21 14:49
    lewfish synchronize #813
  • Aug 21 14:49

    lewfish on fastai

    Add fastai, pytorch, and upgrad… (compare)

  • Aug 20 17:36
    lewfish synchronize #813
  • Aug 20 17:35

    lewfish on fastai

    wip (compare)

  • Aug 20 14:36
    lewfish opened #813
  • Aug 20 14:33

    lewfish on fastai

    Add fastai, pytorch, and upgrad… (compare)

  • Aug 15 20:08
    lewfish synchronize #812
  • Aug 15 20:08

    lewfish on zxy-source

    Add dry_run option Add ZXYRasterSource [skip ci] Update changelog (compare)

  • Aug 15 20:07

    lewfish on zxy

    (compare)

  • Aug 15 20:07

    lewfish on master

    Use different debug port to avo… Add script to generate GeoTIFF … Move to utils and add make_cog … and 1 more (compare)

  • Aug 15 20:07
    lewfish closed #811
  • Aug 15 13:52
    lewfish synchronize #811
  • Aug 15 13:52

    lewfish on zxy

    Use different debug port to avo… Add script to generate GeoTIFF … Move to utils and add make_cog … (compare)

  • Aug 15 13:52
    lewfish synchronize #812
storyofblue
@storyofblue
Hi @lewfish I communicated with AWS China for a while,They do not currently offer AWS::Batch services.Are there any alternatives?
goriliukasbuxton
@goriliukasbuxton
docker in Ubuntu 18.04.02, not starting: sudo systemctl start docker
error: System has not been booted with systemd as init system (PID 1). Can't operate.
lmbak
@lmbak
@lewfish Great, thanks
Lewis Fishgold
@lewfish
@storyofblue You might be able to use the US East 1 AWS region from China. But otherwise you can just start a GPU EC2 instance with an AMI that has Nvidia drivers on it via the console (spot instances are cheaper) and then SSH into it. I'm sure there are instructions out there on how to get to that point. Then you can install nvidia-docker, and use Raster Vision as if it was installed locally. You will probably want to store your data on S3, and pass RV S3-based URIs which it can handle.
That won't provide the same functionality as Batch, but at least you'll be able to use a GPU in the cloud.
storyofblue
@storyofblue
@lewfish thanks
Laurence Watson
@Rabscuttler
@lewfish thanks. I've been using the fastai-plugin. One issue is when I use the predict package, RV complains the FASTAI_SEMANTIC_SEGMENTATION backend is not in the plugin registry, presumably because it hasn't been registered. I guess I should write an experiment using the Predict class, rather than the CLI... but I'm not sure what this would look like.
Can I hard register the backend somehow?
Laurence Watson
@Rabscuttler
Ok, I found the right bit of the docs - should have looked harder: https://docs.rastervision.io/en/0.9/plugins.html#registering-the-plugin
Laurence Watson
@Rabscuttler
Right. I'm a fool, I forgot to set the fastai profile flag -p fastai so it registers the plugin with rastervision. :thumbsup:
Lewis Fishgold
@lewfish
@Rabscuttler Yeah, I've done that too.
Glad to hear someone is trying the fastai plugin. The current plan is to incorporate it into the RV main repo and deprecate the Tensorflow backends. In the long run it'll be too much work to maintain both sets of backends.
Laurence Watson
@Rabscuttler
That makes sense. It seems good, although I couldn't get the oversampling working. In September (past my deadline) I'm going to put some time into the faqs! (azavea/raster-vision#623)
Lewis Fishgold
@lewfish
@Rabscuttler That would awesome!
lmbak
@lmbak
I would like to create an additional augmentor that adds 'random snow' and 'random rotations' to the chips. Although I feel confident building that for my own application I think it would add value to rastervision, but I have limited experience with contributing to a project (yet eager to learn). So would there be time to 'coach me' a bit, or should I just develop locally for my own use case? If there is time, my understanding so far is that I have to create an "rotate_and_add_snow.py" and "rotate_and_add_snow_config.py" in the augmentor folder, the rotate_and_add_snow class should then be called in the chip_command_config.py, right?
Rob Emanuele
@lossyrob
Hi @lmbak, we’d be happy to help you contribute!
Thanks for being willing to take the time to do so
You’re on the right path - if you adhe Config/ConfigBuilder in rotate_and_add_snow_config.py, and the “entity” in rotate_and_add_snow.py (the code that does thee actuual work), you’re pretty much be there
the only other step after that is to makee sure the Augmentor is registered
Rob Emanuele
@lossyrob
From there, you should be able to add it as a regular Augmentor - e.g.
dataset = rv.DatasetConfig.builder() \
                                  .with_train_scene(scene_config) \
                                  .with_validation_scene(scene_config) \
                                  .with_augmentor(rv.AugmentorConfig.builder(rv.ROTATE).with_config(…).build())  \
                                  .build()
(if your Augmentor was a rotator, and was revistered with a key accessible by rv.ROTATE
if you have have augmentations that add rotation and add snow, it might be best to add 2 different augmenters - you can add multiple augmentors to a dataset
Rob Emanuele
@lossyrob
Let us know if you have any questions about how to approach it!
Laurence Watson
@Rabscuttler
Quick question, what is the loss function in the fast AI plugin / pytorch? Trying to follow the dependencies but can't see where it is specified.
Lewis Fishgold
@lewfish
@Rabscuttler I'm pretty sure it's just the usual categorical cross entropy.
Laurence Watson
@Rabscuttler
@lewfish thanks! :)
Drew Bollinger
@drewbo
hi @lewfish , I wasn't using raster vision directly but see that you have a fastai/pytorch plugin coming into core soon, wondering if you've run these experiments on aws batch yet?
Lewis Fishgold
@lewfish
i have.
Drew Bollinger
@drewbo
I ask because I tried something similar and wasn't able to expand the --shm-size/sharedMemorySize on batch and had some errors with larger batch sizes
I'm using the same AMI but a slightly different cloudformation template as the raster-vision-aws one, but tried to mimic it as closely as possible
specifically RuntimeError: DataLoader worker (pid 51) is killed by signal: Bus error.
trying with batch size 16 on 256pixel square chips
Lewis Fishgold
@lewfish
I dealt with that too. I think you need to modify the job definition to give it more shared memory: https://github.com/azavea/raster-vision-fastai-plugin/blob/master/scripts/gpu_job_def.json#L17-L22
Drew Bollinger
@drewbo
awesome, this is perfect
Lewis Fishgold
@lewfish
I created the Job Def using this template which is outside the Cloudformation workflow.
Drew Bollinger
@drewbo
trying it now but I think this is exactly what I needed; I tried almost this but forgot to add it to the volumes when adding to mount points :disappointed:
thanks @lewfish
Lewis Fishgold
@lewfish
glad I could help! i remember it took me a while to figure this out.
Drew Bollinger
@drewbo
I tried to go the --shm-size route for a while before realizing it couldn't be passed with batch (even though it's available in ECS task definitions) and switching this this "workaround" (mounting /dev/shm)
Drew Bollinger
@drewbo
related followup @lewfish , I've been leaving off the GPU resource requirement and looks like you do too, I wasn't able to get it to place tasks with that requirement; unfortunately this means it won't place single GPU tasks on separate GPUs on a p3.8xlarge for example, do you know a way around that?
Blessings Hadebe
@blessings-h
Bit of a rookie question but, from experience, what have you found to be a good train/validation split for object detection? I have ~5300 small objects labelled across ~500 images. Some images have one object in them, others up to 120 objects
Lewis Fishgold
@lewfish
@drewbo I've never used the multi-gpu instances before. I thought that if your job def specifies 8 vCPUs and you ran it on an 8xlarge, then several single GPU tasks could run simultaneously on the instance. That's how it works with the CPU instances.
Drew Bollinger
@drewbo
they run simultaneously on the instance but I think they try to run on the same GPU
so they all get CUDA memory errors
Lewis Fishgold
@lewfish
I think there's a way to tell PyTorch which GPU should be used. I guess you'll have to check to see which ones are already being used to pick the index. Maybe PyTorch has some builtin way of doing this, but I'm not sure
@blessings-h It sounds like enough data to train a model. A typical rule of thumb for picking a training / validation split is 80%/20%. Since there is so much variation in how many objects there are per image, it would be best to divide the images into the splits, so that approximately 80% of the objects are in the training split and 20% in the other. We've written scripts before to do this sort of thing, but never for object detection.
Drew Bollinger
@drewbo
last one I think @lewfish, do y'all have problems with the instances' storage filling up? especially if they are used for multiple runs, seems like artifacts that are saved outside docker can quickly fill the instance (which also has some unnecessary libs given that its generally running docker images)