Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 23 17:48
  • Sep 20 16:23
    pomadchin commented #590
  • Sep 20 16:12
    ngulyaev commented #590
  • Sep 20 13:32
    pomadchin assigned #590
  • Sep 20 13:24
    pomadchin commented #590
  • Sep 20 13:24
    pomadchin commented #590
  • Sep 20 13:19
    pomadchin labeled #590
  • Sep 20 08:27
    ngulyaev edited #590
  • Sep 20 08:26
    ngulyaev edited #590
  • Sep 20 08:25
    ngulyaev edited #590
  • Sep 20 08:24
    ngulyaev opened #590
  • Sep 20 06:09
    skiptoniam starred locationtech/rasterframes
  • Sep 08 10:14
    chdsb forked
    chdsb/sb
  • Aug 22 05:35
  • Aug 19 14:07
    metasim commented #486
  • Aug 18 19:26
    BKDaugherty commented #486
  • Aug 15 15:30
    BKDaugherty starred locationtech/rasterframes
  • Aug 10 06:19
    spacefan starred locationtech/rasterframes
  • Jul 27 13:20
    pomadchin commented #589
  • Jul 26 17:05
    echeipesh opened #589
kembles5
@kembles5
@pomadchin downgraded java to version 11 and sbt worked. Any instructions on how to build the assembly with shaded dependencies would be appreciated. Thanks for all your help
Grigory
@pomadchin
i think, the pip installed RF is already shaded
I would recommend you to follow RF quick start instructions
a good question how to deploy it on a cluster with no k8s, that’s a q to @metasim, but I’m pretty sure it is doable via conda
kembles5
@kembles5
@pomadchin Thanks for the info. The getting started guide is where I'm having issues. Specifically, the "Using pyspark shell" section. I tried following the guide using 0.10.0 and 0.10.1 on spark 3.1. Both initially failed with "modules not found", adding --repositories https://jitpack.io had no effect, although 0.10.1 eventually started working despite no changes, the 0.10.0 continues to have issues with finding the geotrellis module. For 0.10.1 i then ran into the shapeless conflict.
zoneyuan
@zoneyuan
hello, when I use pyrasterframes to write a geotiff, I set raster_dimensions=(5000, 5000), then it failed because of outofmemoryerror. The raster data that I'm trying to write is not so large, just 16m, and my spark worker memory is 2g, why it's not sufficient?
jterry64
@jterry64
Just checking if anyone has come up with a good bootstrap script/configuration/example to follow for running pyrasterframes on AWS EMR? I've still never successfully been able to do it
Grigory
@pomadchin
Hey @jterry64 I think it can be some version of https://github.com/pomadchin/vlm-performance/blob/feature/gt-3.x/emr/bootstrap.sh but with an extra RasterFrames deps
jterry64
@jterry64
Woo I did finally get it to work! This ended up working as a bootstrap scripts:
set -ex

# Install Conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo sh Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda

source ~/.bashrc
export PATH=/usr/local/miniconda/bin:$PATH

# Install GDAL
sudo /usr/local/miniconda/bin/conda config --add channels conda-forge
sudo /usr/local/miniconda/bin/conda install -c conda-forge libnetcdf gdal=3.5.0 -y
sudo /usr/local/miniconda/bin/pip install pyrasterframes geopandas boto3 s3fs

echo "export PATH=/usr/local/miniconda/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/miniconda/lib/:/usr/local/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib" >> ~/.bashrc
echo "export PROJ_LIB=/usr/local/miniconda/share/proj" >> ~/.bashrc
echo "export PYSPARK_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
echo "export PYSPARK_DRIVER_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
jterry64
@jterry64

But now to hammer you with more questions... I've been using the polygonal summary method in geotrellis over blocks from rasters, and it seems like geotrellis has an optimization where it just rasterizes the parts of the polygon that are within the extent of the raster. Is there a good way to replicate this behavior with rasterframes?

I.e. if I do a big join of a bunch of polygons and raster blocks, and now I want to rasterize the polygons to use as a mask, how do I rasterize so that the zone raster is aligned with just that block? In the example in the documentation, it just rasterizes using the dimensions of the raster, but I'm unclear how this actually aligns correctly with raster block: https://rasterframes.io/zonal-algebra.html

zoneyuan
@zoneyuan
could someone please help solve a problem about using spark-submit?
/export/server/spark/bin/spark-submit \
--master yarn \
--num-executors 6 \
--jars /export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyrasterframes/jars/pyrasterframes-assembly-0.10.1.jar \
/tmp/pycharm_project_255/raster_slope.py
Using spark-submit will cause an error as follows:
Traceback (most recent call last):
File "/tmp/pycharm_project_255/raster_slope.py", line 9, in <module>
spark = (SparkSession.builder
File "/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyrasterframes/init.py", line 44, in _rf_init
spark_session.rasterframes = RFContext(spark_session)
File "/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyrasterframes/rf_context.py", line 45, in init
self._jrfctx = self._jvm.org.locationtech.rasterframes.py.PyRFContext(jsess)
File "/export/server/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1573, in call
File "/export/server/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/export/server/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.locationtech.rasterframes.py.PyRFContext.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.objects.Invoke$.apply$default$5()Z
at frameless.RecordEncoder.$anonfun$toCatalyst$2(RecordEncoder.scala:154)
at scala.collection.immutable.List.map(List.scala:293)
at frameless.RecordEncoder.toCatalyst(RecordEncoder.scala:153)
at frameless.TypedExpressionEncoder$.apply(TypedExpressionEncoder.scala:28)
at org.locationtech.rasterframes.encoders.TypedEncoders.typedExpressionEncoder(TypedEncoders.scala:22)
at org.locationtech.rasterframes.encoders.TypedEncoders.typedExpressionEncoder$(TypedEncoders.scala:22)
at org.locationtech.rasterframes.package$.typedExpressionEncoder(package.scala:39)
at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68)
at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)
at org.locationtech.rasterframes.package$.spatialKeyEncoder$lzycompute(package.scala:39)
at org.locationtech.rasterframes.package$.spatialKeyEncoder(package.scala:39)
at org.locationtech.rasterframes.StandardColumns.$init$(StandardColumns.scala:42)
at org.locationtech.rasterframes.package$.<init>(package.scala:39)
at org.locationtech.rasterframes.package$.<clinit>(package.scala)
at org.locationtech.rasterframes.py.PyRFContext.<init>(PyRFContext.scala:49)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
Run the project directly won't cause any error. But I want to set num-executors
Eugene Cheipesh
@echeipesh

Trying to debug a GDAL reading problem that results in following exception (on EMR with GDAL 3.1.2 installed)

Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/bucket/some.tif not supported
  at org.locationtech.rasterframes.ref.RFRasterSource$.$anonfun$apply$1(RFRasterSource.scala:119)
  at scala.compat.java8.functionConverterImpls.AsJavaFunction.apply(FunctionConverters.scala:262)
  at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:139)

rasterframes/RFRasterSource.scala at develop · locationtech/rasterframes · GitHub

In spark-shell on master with the job jar I’m able to reproduce but not explain:

Raster Reads:

val url = "gdal://vsis3/bucket/some.tif”


scala> val rs = RFRasterSource(new java.net.URI(url))
rs: org.locationtech.rasterframes.ref.RFRasterSource = GDALRasterSource(gdal://vsis3/...)

scala> rs.read(GridBounds(0,0,10,10), List(0))
res1: geotrellis.raster.Raster[geotrellis.raster.MultibandTile] = Raster(ArrayMultibandTile(11,11,1,float32ud-3.4028234663852886E38),Extent(4031670.65908466, 3215321.1233700267, 4031725.65908466, 3215376.1233700267))

scala> RFRasterSource.IsGDAL.unapply(new java.net.URI(url))
res2: Boolean = true

scala> spark.read.raster.from(url).load().show()
22/06/09 19:26:23 WARN TaskSetManager: Lost task 986.0 in stage 1.0 (TID 988) (ip-172-31-19-30.eu-west-1.compute.internal executor 7): java.lang.IllegalArgumentException: Error fetching data for one of:
    at org.locationtech.rasterframes.expressions.generators.RasterSourceToRasterRefs.eval(RasterSourceToRasterRefs.scala:83)
    at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$3(GenerateExec.scala:95)
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
    at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:275)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:400)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/...
8 replies
Grigory
@pomadchin
@echeipesh what the isGDAL.unapply prints you on executors?
11 replies
Eugene Cheipesh
@echeipesh
@metasim Related to the question above, I’ve been noticing that GeoTiff reader is just kind of really really bad compared to GDAL when reading tiles. Much worse than I would have expected it to be based on previous expirience with it. I wonder if that lines up with what you’ve seen and if you, per-chance, have a theory as to what is going on. It’d be really nice to be able to handle COGs without GDAL. Especially now that we have EMR Serverless as an option.
Simeon H.K. Fitch
@metasim

@echeipesh That surprises me too. I've not done any RasterFrames work for some time, so I've not been tracking it. Debugging performance issues in Spark is just so damn difficult. Java version different? If I had $$ to work on it, I'd put in dropwizard/prometheus metrics and maybe even tracing on tile ops in hopes instrumentation would help.

I've been wondering fi we should do a new release with just this flag flipped:
https://github.com/locationtech/rasterframes/blob/6bd49e093cd80dd78999d2898e66e76f47d07463/core/src/main/resources/reference.conf#L3

1 reply
In short, I don't know what's going on or what to do about it without a project to support the work.
Eugene Cheipesh
@echeipesh
Yeah, I know that feeling. I hope its ok to talk about issues without expectation that you got answers, it’s FOSS and all. But still I don’t want to stress you out :)
It seems when using GeoTiffRasterSource it’s just spending a lot of time re-reading the TiffTags … not great.
Screen Shot 2022-06-27 at 3.35.33 PM.png
1 reply
Grigory
@pomadchin
@echeipesh ouf and GDAL Caches them :/
Eugene Cheipesh
@echeipesh
Yes, I think there is something suspect going on with the delegate/cache situation. I’ll try to dig into it when I have a minute. Asside from finding what is happening it shouldn’t be a hard fix.
1 reply
wxmimperio
@imperio-wxm

@pomadchin Hi, How to convert rasterframes dataFrame to geotrellis RDD[(K, V)] with Metadata[M]?

val df = spark.read.raster
      .withLazyTiles(true)
      .withSpatialIndex(2)
      .withTileDimensions(512, 512)
      .load(filePath)

 df.printSchema()
 df.asLayer.show(false)

Get error:

Caused by: java.lang.IllegalArgumentException: requirement failed: A RasterFrameLayer requires a column identified as a spatial key
    at scala.Predef$.require(Predef.scala:281)
    at org.locationtech.rasterframes.extensions.DataFrameMethods.asLayer(DataFrameMethods.scala:234)
    at org.locationtech.rasterframes.extensions.DataFrameMethods.asLayer$(DataFrameMethods.scala:229)
    at org.locationtech.rasterframes.extensions.Implicits$WithDataFrameMethods.asLayer(Implicits.scala:59)
wxmimperio
@imperio-wxm
@metasim How to implement image mosaicking with rasterframes?
I see in the source code that the left join is used and the intersection is taken, and the black part is obtained, I actually want the red part.
image.png
def apply(left: DataFrame, right: DataFrame, leftExtent: Column, leftCRS: Column, rightExtent: Column, rightCRS: Column, resampleMethod: GTResampleMethod, 
    fallbackDimensions: Option[Dimensions[Int]]): DataFrame = {
        val leftGeom = st_geometry(leftExtent)
        val rightGeomReproj = st_reproject(st_geometry(rightExtent), rightCRS, leftCRS)
        val joinExpr = new Column(SpatialRelation.Intersects(leftGeom.expr, rightGeomReproj.expr))
        apply(left, right, joinExpr, leftExtent, leftCRS, rightExtent, rightCRS, resampleMethod, fallbackDimensions)
  }

.......
.withColumn(id, monotonically_increasing_id())
      // 2. Perform the left-outer join
      .join(right, joinExprs, joinType = "left")
      // 3. Group by the unique ID, reestablishing the LHS count
      .groupBy(col(id))
......
2 replies
zoneyuan
@zoneyuan
I want to create a function similar to rf_slope, it needs to get one cell's result from the calculation of 8 cells around it. I did not find rf_slope's algorithm code, could someone please teach me how to rewrite it or develop a new function working on neighborhood cells?
Solomon Negusse
@solomon-negusse

Hello, I'm new to rasterframes and encountering the error below while running zonal statistics prototype using pyrasterframes in AWS EMR Serverless environment. I'm a bit lost on the java error and hoping someone can point in the right direction to resolve this.
The environment:
EMR v6.6 (the only version available in EMR Serverless)
Spark v3.2.0
Pyrasterframes v0.10.1

And here is the error:

Traceback (most recent call last):
  File "/tmp/spark-515422fd-4ed2-4ca6-9f2d-160cd422df73/zonal_statistics.py", line 25, in <module>
    spark = create_rf_spark_session()
  File "/home/hadoop/rasterframes/lib/python3.8/site-packages/pyrasterframes/utils.py", line 88, in create_rf_spark_session
    spark.withRasterFrames()
  File "/home/hadoop/rasterframes/lib/python3.8/site-packages/pyrasterframes/__init__.py", line 44, in _rf_init
    spark_session.rasterframes = RFContext(spark_session)
  File "/home/hadoop/rasterframes/lib/python3.8/site-packages/pyrasterframes/rf_context.py", line 45, in __init__
    self._jrfctx = self._jvm.org.locationtech.rasterframes.py.PyRFContext(jsess)
  File "/usr/lib/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1573, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.locationtech.rasterframes.py.PyRFContext.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.objects.Invoke$.apply$default$5()Z
    at frameless.RecordEncoder.$anonfun$toCatalyst$2(RecordEncoder.scala:154)
    at scala.collection.immutable.List.map(List.scala:293)
    at frameless.RecordEncoder.toCatalyst(RecordEncoder.scala:153)
    at frameless.TypedExpressionEncoder$.apply(TypedExpressionEncoder.scala:28)
    at org.locationtech.rasterframes.encoders.TypedEncoders.typedExpressionEncoder(TypedEncoders.scala:22)
    at org.locationtech.rasterframes.encoders.TypedEncoders.typedExpressionEncoder$(TypedEncoders.scala:22)
    at org.locationtech.rasterframes.package$.typedExpressionEncoder(package.scala:39)
    at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68)
    at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)
    at org.locationtech.rasterframes.package$.spatialKeyEncoder$lzycompute(package.scala:39)
    at org.locationtech.rasterframes.package$.spatialKeyEncoder(package.scala:39)
    at org.locationtech.rasterframes.StandardColumns.$init$(StandardColumns.scala:42)
    at org.locationtech.rasterframes.package$.<init>(package.scala:39)
    at org.locationtech.rasterframes.package$.<clinit>(package.scala)
    at org.locationtech.rasterframes.py.PyRFContext.<init>(PyRFContext.scala:49)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)
Grigory
@pomadchin
hey @solomon-negusse, RF works with Spark 3.1.x only, 3.2.x is bin incompatible with 3.1.x
See the WIP draft PR locationtech/rasterframes#587
1 reply
LPontes
@LPontes
Hello folks! any advance on using rasterframes in Databricks? I already ready the topics about it here, however it looks like it doesnt work fine.
Grigory
@pomadchin
hey @LPontes I think it works good if you prebuild the assembly jar with all shaded deps inside and upload it into the DB classpath
24 replies
yang162132
@yang162132
could we get a RasterFrameLayer from Raster or Tile?
Yingyi Wu
@JenniferYingyiWu2020
Hi @pomadchin , I'd like to make PyRasterFrames read "Spark RDD" in "Supervised Machine Learning (https://rasterframes.io/supervised-learning.html)" and "Unsupervised Machine Learning (https://rasterframes.io/unsupervised-learning.html)", instead of HDFS. So, would you pls help to give me some suggestions? Thanks!
image.png
image.png
Grigory
@pomadchin
hey @JenniferYingyiWu2020 I think @metasim can cover better the Python part of it; but what error do you have?