AdeelH on master
implement caching for downloads… use download cache in RasterioS… remove get_cached_file() util f… and 4 more (compare)
hi
i'm running raster vision 0.12 and python 3.6.2 and nvidia geforce rtx 2080 (6 GB and 1920 cores).
input image dimension is 15000*15000, predict batch size is 1, num workers is 4 (by default) and it takes 7 min to process.
when i change predict batch size to 8 or 16 it will take same time (it will use more memory of gpu, take same time).
can you help me to reduce prediction time. what are the parameters i have to change to reduce prediction time?
hi
i'm running raster vision 0.12 and python 3.6.2 and nvidia geforce rtx 2080 (6 GB and 1920 cores).
input image dimension is 15000*15000, predict batch size is 1, num workers is 4 (by default) and it takes 7 min to process.
when i change predict batch size to 8 or 16 it will take same time (it will use more memory of gpu, take same time).
can you help me to reduce prediction time. what are the parameters i have to change to reduce prediction time?
I'm having trouble running the rastervision examples with the test.py script from the github repo. Here's what I do from the cloned repo:
docker run --rm -it raster-vision-pytorch /bin/bash
python "rastervision_pytorch_backend/rastervision/pytorch_backend/examples/test.py" \
run "spacenet-rio-cc" \
--remote
Then I get an error saying: FileNotFoundError: [Errno 2] No such file or directory: 'rastervision': 'rastervision'
I can however navigate to the directory /opt/src/rastervision_pytorch_backend/rastervision so I'm confused why the directory can't be found. I've also tried without the --remote tag and I get the same error.
i'm running rv 0.12. in object detection my batch size is 1 and chip size is 300 when i run predict command it will occupied 1.7 gb in gpu but when i change chip size to 500 or 1000 it will occupied same 1.7 gb in gpu so how chip will store in gpu? can you help me regarding this?
is it possible that i change some settings and my gpu itilization will be high and can reduce prediction time ?
docker run --rm -it raster-vision-pytorch /bin/bash
and then I forward the aws credentials with aws configure
. When I try to run the test.py script with python rastervision_pytorch_backend/rastervision/pytorch_backend/examples/test.py run "spacenet-rio-cc" --remote
, I get an error: rastervision.pipeline.file_system.file_system.NotReadableError: Could not read s3://raster-vision/examples/0.13/processed-data/spacenet-rio-cc/train-scenes.csv
. Full traceback in comments. Any idea why this file isn't readable?
i'm running rv 0.12 for object detection and i have input image with 60006000 dimension and chip size is 300300. so can you give understanding of how many of this parameters can be store in gpu?
how gpu is used in predict command?
can you explain gpu usage ?
if my parameters change how to find estimated gpu memoty usage?
Hello Rastervision team ! Again, thanks a lot for this very nice tool. It is very useful ! However, I have an issue with AOIs, and maybe it is linked to the question that has been asked above (?).
nochip = True
. However, I noticed that some planes were wrongly labelled as "car", so I wanted to add some "plane" images in my dataset, more specifically AOIs that contains planes. 0.13
. aoi_geometry
works with version 0.13
. But this has been fixed in the latest
issue. aoi_uris
in the SceneConfig. This works well : I can see in the dataloader that the chips come from the AOIs I selected (I took very small and specific AOIs just to check on some few images). StopIteration: Failed to find random window within scene AOI.
and realised that my AOIs were sometimes too small relative to the entire image. After looking at this, I increased the max_sample_attempts
and it worked well, although it was much longer of course. (NB : I also saw this and saw that you made the sampling process more efficient in the new version - nice triangulation algorithm btw ! - so I'm considering moving on the latest
image in the future.) File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/learner.py", line 1136, in train_epoch
metrics = self.train_end(outputs, num_samples)
File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/learner.py", line 734, in train_end
for k in outputs[0].keys():
IndexError: list index out of range
/opt/data/output/Makefile:6: recipe for target '0' failed
make: *** [0] Error 1
ERROR: 2
Thanks a lot for your help !
All the best, Laetitia
Hi again, everyone! I am trying to do object detection using rasters from a map located on a WMS server. Just to test it out, I drew some boxes around some pools, so I am trying to predict pools for now. The AOI I labeled is very small compared to the whole map, so I am using the latest (development) version of RV along with pytorch-latest, to make use of the efficient AOI sampling. I believe that this should work, as during my troubleshooting I have managed to print out chips that come from the AOI and are correctly labeled. However, I am still deeply underwater in errors and the more I try to troubleshoot, the more I sink in a pit of despair. I would like to keep trying to make this work so that I can submit a PR and help whoever may be interested in using WMS in the future, but I need some help or some advice on how to properly troubleshoot.
Here is how I run a docker image after I clone the master branch of rastervision:
docker run --rm -it -v ${RV_QUICKSTART_CODE_DIR}:/opt/src/code -v ${RV_QUICKSTART_OUT_DIR}:/opt/data/output quay.io/azavea/raster-vision:pytorch-latest /bin/bash
These are: the xml to get the map, the aoi, the labels and the main code, respectively:
https://github.com/ltociu/object_detection_pools/blob/main/raster.xml
https://github.com/ltociu/object_detection_pools/blob/main/aoi.geojson
https://github.com/ltociu/object_detection_pools/blob/main/labels_0.geojson
https://github.com/ltociu/object_detection_pools/blob/main/object_detection_aoi.py
The error I am getting is related to bbox_params:
...
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 416, in __getitem__
print(self.datasets[dataset_idx][sample_idx])
File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/dataset.py", line 390, in __getitem__
return super().__getitem__(window)
File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/dataset.py", line 56, in __getitem__
x, y = self.transform(val)
File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/dataset.py", line 46, in <lambda>
self.transform = lambda inp: tf_func(inp, transform)
File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/transform.py", line 147, in object_detection_transformer
out = transform(image=x, bboxes=boxes, category_id=class_ids)
File "/opt/conda/lib/python3.7/site-packages/albumentations/core/composition.py", line 195, in __call__
self._check_args(**data)
File "/opt/conda/lib/python3.7/site-packages/albumentations/core/composition.py", line 278, in _check_args
raise ValueError("bbox_params must be specified for bbox transformations")
ValueError: bbox_params must be specified for bbox transformations
However, I have printed out the boxes variable right before the command out = transform(image=x, bboxes=boxes, category_id=class_ids) is called, and it exists, and it has four entries, so I am not sure why this error shows up.
I have done a lot of troubleshooting, mainly by going into the load_window and get_chip functions in /opt/src/rastervision_core/rastervision/core/data/raster_source/rasterio_source.py, and printing out the chips, and they look good. But I can’t figure out the pipeline they go through until they need to be transformed, and why the transformation fails with that bbox_params error. Sometimes, the chips end up being empty, which is a http error due to the WMS server not responding, which should be handled somehow. That’s definitely a change that needs to be made in the code to handle WMS, but the bbox_params error shows up even when the chips are not empty.
Please let me know if you have any idea what’s going wrong, and what’s the best way to go about troubleshooting this.
Training [------------------------------------] /opt/data/output/Makefile:12: recipe for target '2' failed
make: *** [2] Killed
def build_scene(uri, id, channel_order=None):
#Make it dyanmic
label_uri = os.path.join(uri, 'BBs', '')
label_file = id + '.geojson'
label_uri = label_uri + label_file
print("Label uri: ", label_uri)
image_uri = os.path.join(uri, 'RGB', '')
image_file = id + '.tif'
image_uri = image_uri + image_file
print("Image uri: ", image_uri)
raster_source = RasterioSourceConfig(
uris=[image_uri],
channel_order=channel_order,
transformers=[StatsTransformerConfig()])
vector_source = GeoJSONVectorSourceConfig(
uri=label_uri,
default_class_id=0,
ignore_crs_field=True,
#Changes everytime according to labelling
class_id_to_filter={
1: ['==', 'Class', 1],
2: ['==', 'Class', 2],
3: ['==', 'Class', 3]
})
label_source = SemanticSegmentationLabelSourceConfig(
raster_source=RasterizedSourceConfig(
vector_source=vector_source,
rasterizer_config=RasterizerConfig(background_class_id=0)))
#Changes everytime
label_store = SemanticSegmentationLabelStoreConfig(
vector_output=[PolygonVectorOutputConfig(class_id=1),
PolygonVectorOutputConfig(class_id=2),
PolygonVectorOutputConfig(class_id=3)])
return SceneConfig(
id=id,
raster_source=raster_source,
label_source=label_source,
label_store=label_store)
def get_config(runner, raw_uri, root_uri, test=True):
#Get IDs: Remove .tif from images as the names are same, only extension is different
images = os.listdir(os.path.join(raw_uri, 'RGB', ''))
scene_ids = [file[:-4:] for file in images]
print("Scene ID:", scene_ids[0])
split_ratio = 0.8
num_train_ids = round(len(scene_ids) * split_ratio)
train_ids = scene_ids[0:num_train_ids]
val_ids = scene_ids[num_train_ids:]
if test:
train_ids = train_ids[:4:]
val_ids = val_ids[:2:]
#Changes everytime
class_config = ClassConfig(
names=['background', 'motorcycle', 'car', 'ghost'],
colors=['black', 'orange', 'blue', 'red']
)
#Changes everytime
channel_order = [0, 1, 2]
train_scenes = [
build_scene(raw_uri, id, channel_order) for id in train_ids
]
val_scenes = [
build_scene(raw_uri, id, channel_order) for id in val_ids
]
dataset = DatasetConfig(
class_config=class_config,
train_scenes=train_scenes,
validation_scenes=val_scenes)
chip_sz = 325
img_sz = chip_sz
chip_options = SemanticSegmentationChipOptions(
window_method=SemanticSegmentationWindowMethod.sliding,
stride=img_sz)
num_epochs = 1
data = SemanticSegmentationImageDataConfig(
img_sz=img_sz, num_workers=0)
backend = PyTorchSemanticSegmentationConfig(
data=data,
model=SemanticSegmentationModelConfig(backbone=Backbone.resnet50),
solver=SolverConfig(
lr=1e-4,
num_epochs=num_epochs,
test_num_epochs=2,
batch_sz=8,
one_cycle=True))
return SemanticSegmentationConfig(
root_uri=root_uri,
dataset=dataset,
backend=backend,
train_chip_sz=img_sz,
predict_chip_sz=img_sz,
chip_options=chip_options)
{
"type": "FeatureCollection",
"name": "C:\\Users\\john\\Downloads\\AutoMap\\HAGDAVS\\BBs\\El_Retiro_BBs_1_4.geojson",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "OBJECTID": "310", "Class": "2", "Shape_Leng": "9.86674582747e-005", "Shape_Area": "5.27914776648e-010" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -75.501211422868835, 6.064028267090919 ], [ -75.501225410024631, 6.063997672154869 ], [ -75.501239681366144, 6.064004195837015 ], [ -75.501225695109724, 6.064034792571647 ], [ -75.501211422868835, 6.064028267090919 ] ] ] } }
]
vector_source = GeoJSONVectorSourceConfig(
uri=label_uri,
default_class_id=0,
ignore_crs_field=True,
#Changes everytime according to labelling
class_id_to_filter={
1: ['==', 'Class', 1],
2: ['==', 'Class', 2],
3: ['==', 'Class', 3]
})
may I ask, what is this error means?
ERROR 1: TIFFReadDirectory:Cannot handle different values per sample for "SampleFormat"
ERROR 1: TIFFFetchDirectory:Sanity check on directory count failed, this is probably not a valid IFD offset
ERROR 1: TIFFReadDirectory:Failed to read directory at offset 192
I ran into a weird error when running rastervision with docker(pytorch-v0.13.1). When I set the root_uri
to S3 path, following error showed and training stopped:
make: s3://<random_bucket_name>/predictions/Makefile: No such file or directory
make: *** No rule to make target 's3://<random_bucket_name>/predictions/Makefile'. Stop.
This error disappeared when I set root_uri
to local path.
Here are more details in this azavea/raster-vision#1452.
What can be a workaround for this. I wanted my results to be stored in S3. Now because of this, they'll be saved locally. AWS profile has all the rights to S3 read/write.
Any inputs will be appreciated! :)
Also, can anyone tell how can I tune the parameters to get more precise models. Here's a part of my experiment script:
backend = PyTorchSemanticSegmentationConfig(
data=data,
model=SemanticSegmentationModelConfig(backbone=Backbone.resnet50),
solver=SolverConfig(
lr=1e-4,
num_epochs=num_epochs,
test_num_epochs=2,
batch_sz=2,
one_cycle=True))
return SemanticSegmentationConfig(
root_uri=root_uri,
dataset=dataset,
backend=backend,
train_chip_sz=img_sz,
predict_chip_sz=img_sz,
chip_options=chip_options)
I don't know to optimize the model.
After 5 epochs, these are the evaluations I'm getting:
'epoch': 4, 'train_loss': 0.06035802581093528, 'train_time': '7:52:40.221138', 'val_loss': 0.05281096324324608, 'avg_precision': 0.9869782328605652, 'avg_recall': 0.9901804327964783, 'avg_f1': 0.9885767102241516, 'background_precision': 0.9881935715675354, 'background_recall': 0.9960909485816956, 'background_f1': 0.9921265244483948, 'motorcycle_precision': 0.0, 'motorcycle_recall': 0.0, 'motorcycle_f1': 0.0, 'car_precision': 0.8847261667251587, 'car_recall': 0.3847511410713196, 'car_f1': 0.5362827777862549, 'ghost_precision': 0.0, 'ghost_recall': 0.0, 'ghost_f1': 0.0, 'null_precision': 0.9942737221717834, 'null_recall': 0.9997745752334595, 'null_f1': 0.9970165491104126, 'valid_time': '0:37:06.015921'}
As you can see, nothing for motorcycle
& ghost
classes.
I wanted to run a batch job but I'm pretty confused. I've already deployed a stack using cloudformation.
Here are some questions:
1) How to choose the number of desired CPUs/minimum/maximum CPUs?
2) How to set the stack information in the environment file for rastervision to read?
Refer
3) How can I save my AWS profile along with credentials in the config file so that rastervision can read it from there?
Can't wait to perform my first batch training :)