dependabot[bot] on pip
Bump pyspark from 3.1.2 to 3.1.… (compare)
metasim on develop
CI fix. Dependency updates. Spark 3.1.3 and 1 more (compare)
if you still need GT snapshots than they are available on the maven nexus i.e.: https://oss.sonatype.org/content/repositories/snapshots/org/locationtech/geotrellis/geotrellis-spark_2.12/
Check out the GT README badges: https://github.com/locationtech/geotrellis#geotrellis
Thanks @pomadchin. I tried rasterframes 0.10.1 and the GT errors were resolved. Now there's only one module not found.
com.github.everit-org.json-schema#org.everit.json.schema;1.12.2: not found
I'm simply going through the getting started guide for rasterframes (https://rasterframes.io/getting-started.html) and trying to follow the "using pyspark shell" section, and I get the above error. The getting started guide doesn't appear to work with 0.10.0 and 0.10.1
Hi @jpolchlo, thank you. i'm able to get to the pyspark shell now, but get the following error when i run: spark = spark.withRasterFrames(). From what I've read this looks like a scala version mismatch, but I verified I'm using spark 3.1 which uses scala 2.12
: java.lang.NoSuchMethodError: shapeless.DefaultSymbolicLabelling$.instance(Lshapeless/HList;)Lshapeless/DefaultSymbolicLabelling; at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68) at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)
set -ex
# Install Conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo sh Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda
source ~/.bashrc
export PATH=/usr/local/miniconda/bin:$PATH
# Install GDAL
sudo /usr/local/miniconda/bin/conda config --add channels conda-forge
sudo /usr/local/miniconda/bin/conda install -c conda-forge libnetcdf gdal=3.5.0 -y
sudo /usr/local/miniconda/bin/pip install pyrasterframes geopandas boto3 s3fs
echo "export PATH=/usr/local/miniconda/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/miniconda/lib/:/usr/local/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib" >> ~/.bashrc
echo "export PROJ_LIB=/usr/local/miniconda/share/proj" >> ~/.bashrc
echo "export PYSPARK_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
echo "export PYSPARK_DRIVER_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
But now to hammer you with more questions... I've been using the polygonal summary method in geotrellis over blocks from rasters, and it seems like geotrellis has an optimization where it just rasterizes the parts of the polygon that are within the extent of the raster. Is there a good way to replicate this behavior with rasterframes?
I.e. if I do a big join of a bunch of polygons and raster blocks, and now I want to rasterize the polygons to use as a mask, how do I rasterize so that the zone raster is aligned with just that block? In the example in the documentation, it just rasterizes using the dimensions of the raster, but I'm unclear how this actually aligns correctly with raster block: https://rasterframes.io/zonal-algebra.html
Trying to debug a GDAL reading problem that results in following exception (on EMR with GDAL 3.1.2 installed)
Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/bucket/some.tif not supported
at org.locationtech.rasterframes.ref.RFRasterSource$.$anonfun$apply$1(RFRasterSource.scala:119)
at scala.compat.java8.functionConverterImpls.AsJavaFunction.apply(FunctionConverters.scala:262)
at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:139)
rasterframes/RFRasterSource.scala at develop · locationtech/rasterframes · GitHub
In spark-shell
on master with the job jar I’m able to reproduce but not explain:
Raster Reads:
val url = "gdal://vsis3/bucket/some.tif”
scala> val rs = RFRasterSource(new java.net.URI(url))
rs: org.locationtech.rasterframes.ref.RFRasterSource = GDALRasterSource(gdal://vsis3/...)
scala> rs.read(GridBounds(0,0,10,10), List(0))
res1: geotrellis.raster.Raster[geotrellis.raster.MultibandTile] = Raster(ArrayMultibandTile(11,11,1,float32ud-3.4028234663852886E38),Extent(4031670.65908466, 3215321.1233700267, 4031725.65908466, 3215376.1233700267))
scala> RFRasterSource.IsGDAL.unapply(new java.net.URI(url))
res2: Boolean = true
scala> spark.read.raster.from(url).load().show()
22/06/09 19:26:23 WARN TaskSetManager: Lost task 986.0 in stage 1.0 (TID 988) (ip-172-31-19-30.eu-west-1.compute.internal executor 7): java.lang.IllegalArgumentException: Error fetching data for one of:
at org.locationtech.rasterframes.expressions.generators.RasterSourceToRasterRefs.eval(RasterSourceToRasterRefs.scala:83)
at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$3(GenerateExec.scala:95)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:275)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:400)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/...