dependabot[bot] on pip
Bump pyspark from 3.1.2 to 3.1.… (compare)
metasim on develop
CI fix. Dependency updates. Spark 3.1.3 and 1 more (compare)
Hi @jpolchlo, thank you. i'm able to get to the pyspark shell now, but get the following error when i run: spark = spark.withRasterFrames(). From what I've read this looks like a scala version mismatch, but I verified I'm using spark 3.1 which uses scala 2.12
: java.lang.NoSuchMethodError: shapeless.DefaultSymbolicLabelling$.instance(Lshapeless/HList;)Lshapeless/DefaultSymbolicLabelling; at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68) at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)
set -ex
# Install Conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo sh Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local/miniconda
source ~/.bashrc
export PATH=/usr/local/miniconda/bin:$PATH
# Install GDAL
sudo /usr/local/miniconda/bin/conda config --add channels conda-forge
sudo /usr/local/miniconda/bin/conda install -c conda-forge libnetcdf gdal=3.5.0 -y
sudo /usr/local/miniconda/bin/pip install pyrasterframes geopandas boto3 s3fs
echo "export PATH=/usr/local/miniconda/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/miniconda/lib/:/usr/local/lib:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib" >> ~/.bashrc
echo "export PROJ_LIB=/usr/local/miniconda/share/proj" >> ~/.bashrc
echo "export PYSPARK_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
echo "export PYSPARK_DRIVER_PYTHON=/usr/local/miniconda/bin/python" >> ~/.bashrc
But now to hammer you with more questions... I've been using the polygonal summary method in geotrellis over blocks from rasters, and it seems like geotrellis has an optimization where it just rasterizes the parts of the polygon that are within the extent of the raster. Is there a good way to replicate this behavior with rasterframes?
I.e. if I do a big join of a bunch of polygons and raster blocks, and now I want to rasterize the polygons to use as a mask, how do I rasterize so that the zone raster is aligned with just that block? In the example in the documentation, it just rasterizes using the dimensions of the raster, but I'm unclear how this actually aligns correctly with raster block: https://rasterframes.io/zonal-algebra.html
Trying to debug a GDAL reading problem that results in following exception (on EMR with GDAL 3.1.2 installed)
Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/bucket/some.tif not supported
at org.locationtech.rasterframes.ref.RFRasterSource$.$anonfun$apply$1(RFRasterSource.scala:119)
at scala.compat.java8.functionConverterImpls.AsJavaFunction.apply(FunctionConverters.scala:262)
at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:139)
rasterframes/RFRasterSource.scala at develop · locationtech/rasterframes · GitHub
In spark-shell
on master with the job jar I’m able to reproduce but not explain:
Raster Reads:
val url = "gdal://vsis3/bucket/some.tif”
scala> val rs = RFRasterSource(new java.net.URI(url))
rs: org.locationtech.rasterframes.ref.RFRasterSource = GDALRasterSource(gdal://vsis3/...)
scala> rs.read(GridBounds(0,0,10,10), List(0))
res1: geotrellis.raster.Raster[geotrellis.raster.MultibandTile] = Raster(ArrayMultibandTile(11,11,1,float32ud-3.4028234663852886E38),Extent(4031670.65908466, 3215321.1233700267, 4031725.65908466, 3215376.1233700267))
scala> RFRasterSource.IsGDAL.unapply(new java.net.URI(url))
res2: Boolean = true
scala> spark.read.raster.from(url).load().show()
22/06/09 19:26:23 WARN TaskSetManager: Lost task 986.0 in stage 1.0 (TID 988) (ip-172-31-19-30.eu-west-1.compute.internal executor 7): java.lang.IllegalArgumentException: Error fetching data for one of:
at org.locationtech.rasterframes.expressions.generators.RasterSourceToRasterRefs.eval(RasterSourceToRasterRefs.scala:83)
at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$3(GenerateExec.scala:95)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:275)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:400)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.UnsupportedOperationException: Reading 'gdal://vsis3/...