Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 06:26
    JenniferYingyiWu2020 commented #542
  • 03:55
    JenniferYingyiWu2020 edited #542
  • 03:53
    JenniferYingyiWu2020 edited #542
  • 03:53
    JenniferYingyiWu2020 opened #542
  • Jan 22 02:19
  • Jan 22 02:19
    zacklin923 starred locationtech/rasterframes
  • Jan 21 15:51
    vpipkt closed #541
  • Jan 21 15:51
    vpipkt commented #541
  • Jan 21 15:45
    vpipkt edited #541
  • Jan 21 15:36
    vpipkt labeled #541
  • Jan 21 02:25
    JenniferYingyiWu2020 edited #541
  • Jan 21 02:19
    JenniferYingyiWu2020 edited #541
  • Jan 21 02:19
    JenniferYingyiWu2020 edited #541
  • Jan 21 02:18
    JenniferYingyiWu2020 opened #541
  • Jan 19 15:41
    vpipkt closed #539
  • Jan 19 15:41
    vpipkt commented #539
  • Jan 19 07:59
    JenniferYingyiWu2020 commented #539
  • Jan 19 07:12
    JenniferYingyiWu2020 commented #539
  • Jan 19 07:10
    JenniferYingyiWu2020 commented #539
  • Jan 19 06:50
    JenniferYingyiWu2020 commented #539
Yingyi Wu
@JenniferYingyiWu2020
The above result is due to my modified the codes to:
miguelgcg
@miguelgcg

Hi all, first of all thank you for the great job with rasterframes. I just started my first steps into EO data processing in pyspark and rasterframes and i have couple of conceptual questions to have a clear idea of how all of this comes together.

In my case study i have several images i want to use as input for a supervised classification. I have been reading the rasterframes documentation and explore the docker image you have with the examples folder. Fromt the supervised-learning notebook i clearly see the steps but my question are:

  • once i produce my output (classification), I would need to get not only the model accuracy but also the classified raster. I see there is a rf.write.geotiff() function but in the documentation it is stated that this is not a smart move. How should i then create an output that i can then get and visualize in regular GIS Software like QGIS for example?

  • I intend to start using a cloud provider to send my application via spark-submit() and run it there. I do not have a clear idea on this step. I guess i have to produce a .py file + dependencies (.yml) and in that .py file write the pyspark code that makes use of rasterframe to run my analysis. Is this right? Am I missing something?

7 replies
Jason T Brown
@vpipkt
@miguelgcg @JenniferYingyiWu2020 There have been several questions lately regarding writing geotiffs. if your use case is visualization (not like further raster processing in the GIS system), you might consider the slippy map writing capability in EarthAI Notebook from Astraea (the main developers behind RasterFrames). https://docs.astraea.earth/hc/en-us/articles/360049952271-How-to-Create-Interactive-Maps-from-a-RasterFrame
"slippy maps" have wide support in both web map frameworks like leaflet and in desktop GIS systems. https://www.spatialbias.com/2018/02/qgis-3.0-xyz-tile-layers/ https://enterprise.arcgis.com/en/portal/10.4/use/tile-layers.htm
Yingyi Wu
@JenniferYingyiWu2020
Hi @vpipkt , I have created a new project named "rasterframes-GeoTIFFs", and the codes of supervised machine learning, error logs of "raster_dimensions" omit, and gdalinfo while choosing "raster_dimensions=(558, 507)". Also, the ".tiff" file generated by supervised machine learning and the "show" plot.
(locationtech/rasterframes#539)

@vpipkt , I have adopted your suggestion and omitted the "raster_dimensions" parameter, however, "java.lang.OutOfMemoryError: Java heap space" took place. Furthermore, I have replaced "raster_dimensions=(558, 507)" with "raster_dimensions=(5580, 5070)" or with "raster_dimensions=(1558, 1507)", but the "java.lang.OutOfMemoryError: Java heap space" errors also appeared.
A new project named "rasterframes-GeoTIFFs" has been created on my Github page. I have uploaded the "unsupervised machine learning" and "supervised machine learning (https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/machine-learning/supervised_machine_learning.py)" codes, also the error logs (https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/tree/main/error-logs) while changing the "raster_dimensions".
The show output result of GeoTIFFs is: "https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/show-output-result/supervised-machine-learning/show.png".

Lastly, the output of gdalinfo for the resulting output .tiff generated by "supervised machine learning" is below. In that case, the "raster_dimensions" parameter is "raster_dimensions=(558, 507)".
(https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/show-output-result/supervised-machine-learning/gdalinfo.txt)

7 replies
image.png
@vpipkt , my issue is why the output result GeoTIFFs of supervised machine learning is abnormal? I have opened the ".tiff" from my computer, however it looks like it:
1 reply
Jason T Brown
@vpipkt
I have filed a PR for improving the documentation about the GeoTIFF writing. I hope it will clear things up for everyone. Please see https://github.com/locationtech/rasterframes/pull/540/files
miguelgcg
@miguelgcg

Hi, I am using the the s22s/rasterframes docker image to practice and when trying to open a geojson file to clip a raster I am getting a permission denied error.

So far I am doing:
df_geojson = spark.read.geojson('path/to/file.geojson')

The geojson file is stored in a local folder that I linked to the container with the -v parameter when running it ( -v my/local/folder:/home/jovyan/work)

Py4JJavaError: An error occurred while calling o179.load. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.io.FileNotFoundException: /home/jovyan/work/SA_geojson.geojson (Permission denied)

Any idea what I could be missing?

Simeon H.K. Fitch
@metasim
@miguelgcg If you launch a terminal from Jupyter, are you able to see that file?
miguelgcg
@miguelgcg
@metasim , yes I can cd to that folder and see the file with ls
Simeon H.K. Fitch
@metasim
Does a cell with !ls /home/jovyan/work/SA_geojson.geojson also work?
miguelgcg
@miguelgcg
@metasim reading a bit about permissions in docker I managed to solve the issue by running the container with the --user root and -e GRANT_SUDO=yes tags. After that, I could modify the file/folder permisssions and avoid the permisssion denied. Would this be a logic solution or did you have in mind another approach?
Simeon H.K. Fitch
@metasim
Yeh, I don't think you should have to do that, as I use the same technique all the time without it. Are you on MacOS or Windows?
miguelgcg
@miguelgcg
Working on a Linux Ubuntu machine
miguelgcg
@miguelgcg

when trying to run the supervised-learning notebook from the docker s22s/rasterframe image, I am getting the following error in cell 3

crses = df.select('crs.crsProj4').distinct().collect()

Py4JJavaError: An error occurred while calling o142.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 1.0 failed 1 times, most recent failure: Lost task 37.0 in stage 1.0 (TID 207, localhost, executor driver): java.lang.IllegalArgumentException: Error fetching data for one of: GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/SCL.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B01.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B02.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B03.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B04.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B05.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B06.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B07.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B08.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B09.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B11.tif), GDALRasterSource(s3://s22s-test-geotiffs/luray_snp/B12.tif)

Narges Takhtkeshha
@narges-tk
image.png
Hi all, I switched from windows to ubuntu , now I get this error in
"spark = create_rf_spark_session()" line.Could you help me, please?
Yingyi Wu
@JenniferYingyiWu2020
Hi @vpipkt , I have accepted your suggestions on "#539", and modified the codes of "supervised machine learning" (https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/machine-learning/supervised_machine_learning.py), however my output ".tiff" is a little difference from yours on "#539".
Moreover, I have uploaded my output ".tiff" on "https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/show-output-result/supervised-machine-learning/show_version%200.2.png", in comparison with the output result on "#539", which is "https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/show-output-result/supervised-machine-learning/vpipkt/github-reply.png".
image.png
image.png
I feel the difference with the above two ".tiff" show is mine lost the "the label GeoJSON shapes (pink)". As on "locationtech/rasterframes#539", you say that "A further refinement would be to apply masking to the data, as on lines 113-120 before the join.", but I am so confused about how to modify the codes?
So, could you pls give me some suggestions? Thanks!
Yingyi Wu
@JenniferYingyiWu2020
image.png
@vpipkt , above screenshot is my python codes added in supervised machine learning ("https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/machine-learning/supervised_machine_learning.py"), however I got an output result ".tiff" that is a little different from yours on "https://github.com/locationtech/rasterframes/issues/539#issuecomment-762926009".
Yingyi Wu
@JenniferYingyiWu2020
image.png
So, could you pls give me some suggestions on how to show "the label GeoJSON shapes (pink)" on output result and make "A further refinement would be to apply masking to the data, as on lines 113-120 before the join."?
(locationtech/rasterframes#541)
Narges Takhtkeshha
@narges-tk
What's the latest version of Spark for pyrasterframes? Is it 2.4.5?
tosen1990
@tosen1990
You are right. I checked the previous build doc said RF 0.9 is built with 2.4.5. But I use spark 2.4.7, it works fine currently.
1 reply
Simeon H.K. Fitch
@metasim
Narges Takhtkeshha
@narges-tk
@tosen1990 Thx
Narges Takhtkeshha
@narges-tk
I solved my issue with "create_rf_spark_session" at last. I was using Spark=3.0.1! thank you everyone.
Simeon H.K. Fitch
@metasim
:clap:
Simeon H.K. Fitch
@metasim

📣📣 If you're a user of RasterFrames (or GeoMesa) and would like to see it kept up to date with the latest versions of Spark and JVM technologies, there's a small favor you could do us to help. Go to issue SPARK-7768 and vote for it.

The TL;DR of it is that RasterFrames and GeoMesa (and other frameworks built on Spark using UDTs) use a non-sustainable hack to register the types with Spark. This hack would no longer be required if the Spark committers changed literally one line of code. This ticket has been open since 2015 against Spark 1.5 and keeps getting pushed to the next release.

Regrettably, to vote for the issue you have to create an account on the Spark Jira system, but my hope is that collectively overcoming that small bit of friction will reap larger rewards.

This message was deleted
Yingyi Wu
@JenniferYingyiWu2020
@vpipkt , Now, I need to construct my own data set for supervised machine learning, however, after my execution, the errors “org.apache.spark.SparkException: ML algorithm was given empty dataset.” took place.
Before that, the codes of supervised machine learning can run successfully on the data set of “eleven bands of 60 meter resolution Sentinel-2 imagery” (https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial). Because of my hopes on run the codes on my own data set, I have got 11 bands “.tiff” of image set. Also, I generate the “SCL.tif” (scene classification (SCL) data), after my carefully reading on Scene Classification (SC) of Level-2A Algorithm (https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm). However, the following errors happened:
image.png
Moreover, I have noticed the codes “rf_local_is_in('scl', [0, 1, 8, 9, 10])”, could you pls tell me why the integer array is “[0, 1, 8, 9, 10]”? If I use my own data set, then how to define the above array?
1 reply
image.png
My SCL.tif looks like below:
image.png
So, could you pls help to give me some suggestions on how to resolve the error “ML algorithm was given empty dataset.”? Also, could you pls help to explain why the array is “[0, 1, 8, 9, 10]” in the codes of “rf_local_is_in('scl', [0, 1, 8, 9, 10])”? Thanks!
Hi @vpipkt , My own image data set for "supervised machine learning" is "https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/tree/main/image-dataset/20200613clip". At the same time, I have modified the codes of "supervised machine learning" (https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/machine-learning/supervised_machine_learning.py). Moreover, the error logs after running "supervised machine learning" is "https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/error-logs/ML_algorithm_given_empty_dataset.log".
So, could you pls help to give me suggestions on how to resolve the error "ML algorithm was given empty dataset."? Thanks!
(locationtech/rasterframes#542)
Yingyi Wu
@JenniferYingyiWu2020
Hi @vpipkt , what do you think about the above issue? Whether it is caused by bad data set ("https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/tree/main/image-dataset/20200613clip") that has not met the requirements of "supervised machine learning"? Or, do you think we need to modify the codes of "supervised machine learning" (https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/blob/main/machine-learning/supervised_machine_learning.py) to adapt my own data set ("https://github.com/JenniferYingyiWu2020/rasterframes-GeoTIFFs/tree/main/image-dataset/20200613clip")
This could lead to NullPointException.
Having an array with values that may be null is probably not a good idea in the first place.
.filterNot(_._2.contains(null)) can be a workaound to avoid this problem.
tosen1990
@tosen1990
But the best choice I think should be creating a NoData tile rather than just filter the null tiles out. Otherwise it will lead to tiles in the bottom and right most hand misplace.
image.png
Jason T Brown
@vpipkt
@tosen1990 would you be comfortable filing an issue about this on the locationtech/rasterframes repo?