Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Yingyi Wu
@JenniferYingyiWu2020
the below errors have been encountered:
image.png
As you have known, in the first case that is "test_df = spark.read.raster("hdfs://...")", the cannot be found ".tif" image can be found successfully:
image.png

In conclusion, I have set "HADOOP_USER_NAME" in my python codes:
os.environ["HADOOP_USER_NAME"] = 'geotrellis'
spark = create_rf_spark_session(**{

#     'spark.driver.extraJavaOptions': gt_aws_no_sign,
#     'spark.executor.extraJavaOptions': gt_aws_no_sign
'HADOOP_USER_NAME': hadoop_user_name

})

So, could you pls help to give me some suggestions on how to resolve the errors of below? Thanks!
"java.lang.IllegalArgumentException: Error fetching data for one of: HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/SCL.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B2.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B3.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B4.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B5.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B6.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B7.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B9.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B10.tif), HadoopGeoTiffRasterSource(hdfs://192.168.101.181:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B11.tif)"
Yingyi Wu
@JenniferYingyiWu2020
Moreover, if the command "spark.read.raster("hdfs://...")" from the Hadoop user on my local computer, then the following errors appear:
image.png
image.png
However, the ".tiff" image do exist:
image.png
So, @metasim , @pomadchin , could you pls help to analyse the above issues for me? Thanks!
Yingyi Wu
@JenniferYingyiWu2020

@vpipkt , I need to run the "supervised machine learnining" (https://rasterframes.io/supervised-learning.html), and replace the "uri_base" with "/vsihdfs/hdfs://192.168.101.181:9000/.../{}.tif" or "hdfs://192.168.101.181:9000/.../{}.tif". However, the above issues and errors have been encountered by me.
So, could you pls help to give me some suggestions on how to use "Hadoop data set path" in the "uri_base" of "supervised machine learning"? Thanks!
By the way, I have set "os.environ["HADOOP_USER_NAME"] = 'geotrellis'" and
spark = create_rf_spark_session(**{

#     'spark.driver.extraJavaOptions': gt_aws_no_sign,
#     'spark.executor.extraJavaOptions': gt_aws_no_sign
'HADOOP_USER_NAME': 'geotrellis'

})
And also, I have installed GDAL and RasterFrames environment on all the workers of Hadoop Server.

pbebbo
@pbebbo

when trying to load the previously trained

pipeline = PipelineModel.load(f"s3a://models/{TAGET_NAME}.model/").cache()

I get the following issue:

Traceback (most recent call last):
  File "/home/jovyan/ml_script.py", line 205, in <module>
    pipeline = PipelineModel.load("s3a://nrcan-data/models/" + str(TAGET_NAME) + ".model/").cache()
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 362, in load
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 244, in load
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 378, in load
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 612, in loadParamsInstance
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 362, in load
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 300, in load
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o228.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 14.0 failed 4 times, most recent failure: Lost task 2.3 in stage 14.0 (TID 454, 10.4.9.4, executor 25): java.io.InvalidClassException: org.apache.spark.sql.execution.FileSourceScanExec; local class incompatible: stream classdesc serialVersionUID = -3589590085483687218, local class serialVersionUID = 1920947604238219635

could you advise on what the cause of this is?

Simeon H.K. Fitch
@metasim
@pbebbo Not seen that before. This is a hack (back it up first) but try a search/replace with -3589590085483687218 to 1920947604238219635 on the files inside the *.model directory. Just something to try. My guess is that the version of Java or Spark used when saving the model was different from what is being used now. Why the model serialization would ever serialize that , I don't know.
Yingyi Wu
@JenniferYingyiWu2020
Hi @metasim , how to resolve "CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://[server_ip]:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory"? (locationtech/rasterframes#550)

java.lang.IllegalArgumentException: Error fetching data for one of: GDALRasterSource(hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif)

Caused by: geotrellis.raster.gdal.MalformedDataException: Unable to construct a RasterExtent from the Transformation given. GDAL Error Code: 4

Yingyi Wu
@JenniferYingyiWu2020
Hi @metasim , "spark.read.raster" has bee used in supervised machine learning (https://rasterframes.io/supervised-learning.html), if now I would like to use "hdfs://....tif", then how to modify the codes?
Michał Gołębiewski
@mjgolebiewski
hi guys, how is RF using Geomesa vector functions (st_ ones)? is it possible to import only Geomesa functions with Rasterframes installed?
James Hughes
@jnh5y
Do you mean "without" RasterFrames by chance?
Michał Gołębiewski
@mjgolebiewski
actually no, i meant it - id like to import Geomesa functions without importing rest of Rasterframes, is it possible?
2 replies
James Hughes
@jnh5y
Both projects call the Spark bits for registering the functions. You'd have to take over that
Michał Gołębiewski
@mjgolebiewski
thanks, ill look into it further :)
Yingyi Wu
@JenniferYingyiWu2020
@jamesmcclain , I have read the supervised machine learning on the page of "https://rasterframes.io/supervised-learning.html", now I would like to modify the "uri_base" and upload the data set to the Hadoop cluster. However, when I set the codes as "uri_base = '/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/.../{}.tif'", and execute the codes, then the following errors have been encountered:
crses = df.select('crs.crsProj4').distinct().collect()
[Stage 4:=================================================> (180 + 8) / 200][3 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/SCL.tif: No such file or directory
[4 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/SCL.tif: No such file or directory
21/03/25 09:52:47 ERROR Executor: Exception in task 37.0 in stage 4.0 (TID 415)
java.lang.IllegalArgumentException: Error fetching data for one of: GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/SCL.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B2.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B3.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B4.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B5.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B6.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B7.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B9.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B10.tif), GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B11.tif)
So, @jamesmcclain , could you pls help to give me some suggestions?
Yingyi Wu
@JenniferYingyiWu2020

By the way, I have installed GDAL and RasterFrames on the Hadoop cluster. Besides, the data set has been uploaded to the Hadoop cluster, too.
After my several try, I still cannot resolved the above issue.
From other points, I can got successful results for reading a GEO ".tiff", however I cannot do "spark.read.raster()".

rf = spark.read.format('GeoTiff').load('hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif')
rf.printSchema()
root
|-- spatial_key: struct (nullable = false)
| |-- col: integer (nullable = false)
| |-- row: integer (nullable = false)
|-- extent: struct (nullable = true)
| |-- xmin: double (nullable = false)
| |-- ymin: double (nullable = false)
| |-- xmax: double (nullable = false)
| |-- ymax: double (nullable = false)
|-- crs: struct (nullable = true)
| |-- crsProj4: string (nullable = false)
|-- metadata: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = false)
|-- tile: tile (nullable = false)

rf.select(rf_crs("crs").alias("value")).first()

21/03/24 16:47:42 WARN TaskSetManager: Stage 0 contains a task of very large size (6580 KB). The maximum recommended task size is 100 KB.
Row(value=Row(crsProj4='+proj=utm +zone=52 +datum=WGS84 +units=m +no_defs '))

rf= spark.read.geotiff('hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif')
rf.select(rf_crs("crs").alias("value")).first()
rf.printSchema()
root
|-- spatial_key: struct (nullable = false)
| |-- col: integer (nullable = false)
| |-- row: integer (nullable = false)
|-- extent: struct (nullable = true)
| |-- xmin: double (nullable = false)
| |-- ymin: double (nullable = false)
| |-- xmax: double (nullable = false)
| |-- ymax: double (nullable = false)
|-- crs: struct (nullable = true)
| |-- crsProj4: string (nullable = false)
|-- metadata: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = false)
|-- tile: tile (nullable = false)

21/03/24 16:59:23 WARN TaskSetManager: Stage 2 contains a task of very large size (6580 KB). The maximum recommended task size is 100 KB.
Row(value=Row(crsProj4='+proj=utm +zone=52 +datum=WGS84 +units=m +no_defs '))

rf = spark.read.raster('/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif')
rf.select(rf_crs("proj_raster").alias("value")).first()

[3 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory
[4 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory

silnykamcio
@silnykamcio

Hello,

How should I use RasterFrames with pySpark to access data from S3 datastore with credentials? I cannot find any possibility to provide my access and secret keys in Python API. Is such thing possible?

Michał Gołębiewski
@mjgolebiewski
hi, does current Rasterframes (0.9.0) supports pandas_udf?
Simeon H.K. Fitch
@metasim
No unfortunately. PySpark's Arrow support doesn't support UDTs or other tensor-like constructs required to pass the raster data to the pandas udf context. :(
I'd have to dig to provide more details, but we tried implementing it last year.
It's actually a major problem that I'm at an impass on how to support, because otherwise Python UDFs are extremely slow.
Michał Gołębiewski
@mjgolebiewski
thanks for quick reply
Simeon H.K. Fitch
@metasim
:balloon: :balloon: RasterFrames 0.9.1 is Released! :balloon: :balloon:
Release notes: https://github.com/locationtech/rasterframes/releases/tag/0.9.1
Artifacts deployed to Maven Central, PyPi, and Docker Hub.
tosen1990
@tosen1990
Congrats!
Yingyi Wu
@JenniferYingyiWu2020
@metasim , Warmest congratulations on your victory!
Chamin Nalinda
@0xchamin
hi devs, I'd like to inquire what methods available in Rasterframe to split a .tif file into set of tiles (size can be configured). I'm particularly interested in Scala's way of doing it. many thanks!
Simeon H.K. Fitch
@metasim
@0xchamin When you use the spark.read.raster(...) function on a GeoTIFF, it by default splits the raster into 256^2 sized tiles .
You can change that size by a parameter to the function
While the examples in the documentation are in Python, the Scala methods are very similar.
One difference with the spark.read.raster construct is that the Scala uses a builder pattern instead of Python's untyped key-value parameters:
https://github.com/locationtech/rasterframes/blob/6bd49e093cd80dd78999d2898e66e76f47d07463/datasource/src/test/scala/org/locationtech/rasterframes/datasource/raster/RasterSourceDataSourceSpec.scala#L102-L105
So you may have to look at the source for the few cases where there are differences. Most of the columnar functions have the same signatures (Python just wraps the Scala implementations).
msmahasm
@msmahasm:matrix.org
[m]
how to tag building footprint data to specific location using rasterFrames?
assume im having tiff file for specific location and existing OSM data as well
KIndly share thoughts on this
Simeon H.K. Fitch
@metasim
You do a spatial join.