Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Grigory
@pomadchin

hey @imperio-wxm

I think at this point, the help can only be done by inspecting those rasters only

could you share it mb?
Simeon H.K. Fitch
@metasim
@imperio-wxm What happens if you remove the withCRS(LatLng)?
The fact that there's a maximum NDVI value of 75 is just bizarre.
Maybe try reading and then immediately writing a single band in RF and comparing that with gdalinfo/qgis.
wxmimperio
@imperio-wxm
@metasim Hi, If i remove withCRS get an error:
Caused by: java.lang.IllegalArgumentException: A destination CRS must be provided
    at org.locationtech.rasterframes.datasource.geotiff.GeoTiffDataSource.$anonfun$createRelation$7(GeoTiffDataSource.scala:73)
    at scala.Option.getOrElse(Option.scala:189)
    at org.locationtech.rasterframes.datasource.geotiff.GeoTiffDataSource.createRelation(GeoTiffDataSource.scala:73)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
3 replies
zoneyuan
@zoneyuan
Hey can someone help me? I do installed pyrasterframe successfully, but when I try the example, I cant read raster correctly
image.png
image.png
seems like spark.read doesn't include raster, what should i do?
Simeon H.K. Fitch
@metasim
@zoneyuan The column df.tile doesn't exist. Print the schema to discover what columns are available for that image file.
zoneyuan
@zoneyuan
@metasim get it, thanks :)
kembles5
@kembles5
Hello, i'm attempting to use pyrasterframes from the pyspark shell. I set the py-files config to the zip file i downloaded. I've tried zip files for both 2.11 (which fails b/c I'm using spark 3) and 2.12 which fails to find the geotrellis dependency. Digging a little deeper and it looks as if geotrellis does not run on spark 3. If that's the case, does rasterframes also not run on spark 3?
jpolchlo
@jpolchlo
@kembles5 Geotrellis does run on Spark 3. That's not to say that I know what problem you're running into, but GT 3.6.0 and up are spark 3 compatible.
You may want to check which version is included in the zip file you want to use.
I've been running rasterframes 0.10.1 (in Scala, though) in spark 3.1.2 without a problem.
kembles5
@kembles5
@jpolchlo good to hear that it runs on spark 3. I'm getting the following module not found: org.locationtech.geotrellis#geotrellis-spark_2.12;3.6.1-SNAPSHOT
which may be related to an issue on the geotrellis side
glancing at the geotrellis gitter and there's mention today of a build/dependency issue
which i do not get if I use the 2.12-0.9.1 zip file
sorry, meant 2.11-0.9.1
jpolchlo
@jpolchlo
@kembles5 OK, that makes sense. Bintray shut down, so our snapshots are being hosted elsewhere. I think https://repo.eclipse.org/content/repositories/geotrellis-snapshots/org/locationtech/geotrellis/ ? I'm not 100% on how to use this information to solve your problem. Hopefully one of the rasterframes engineers can speak to that.
In the past I've used the --repositories flag to spark-submit to download packages from nonstandard locations. Perhaps there's a way to do that with pyspark?
kembles5
@kembles5
@jpolchlo thank you. i'll see if i can add a --repository flag. fwiw, here's the command i'm running:
pyspark --archives hub_env.tar.gz --py-files pyrasterframes_2.12-0.10.0-python.zip --packages org.locationtech.rasterframes:rasterframes_2.12:0.10.0,org.locationtech.rasterframes:pyrasterframes_2.12:0.10.0,org.locationtech.rasterframes:rasterframes-datasource_2.12:0.10.0 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=org.locationtech.rasterframes.util.RFKryoRegistrator --conf spark.kryoserializer.buffer.max=500m
Grigory
@pomadchin

hey @kembles5 I am not sure why do you need a snapshot, the most recent gt release is 3.6.2 and the most recent RF release is is 0.10.1 which depends on GT 3.6.1 transitively;

RF of versions prior to 0.10.0 are not compatible with Spark 3.x

if you still need GT snapshots than they are available on the maven nexus i.e.: https://oss.sonatype.org/content/repositories/snapshots/org/locationtech/geotrellis/geotrellis-spark_2.12/

Check out the GT README badges: https://github.com/locationtech/geotrellis#geotrellis

kembles5
@kembles5

Thanks @pomadchin. I tried rasterframes 0.10.1 and the GT errors were resolved. Now there's only one module not found.

com.github.everit-org.json-schema#org.everit.json.schema;1.12.2: not found

I'm simply going through the getting started guide for rasterframes (https://rasterframes.io/getting-started.html) and trying to follow the "using pyspark shell" section, and I get the above error. The getting started guide doesn't appear to work with 0.10.0 and 0.10.1

Grigory
@pomadchin
@kembles5 yea,unfortunately, we use a schema validator that is published here https://jitpack.io
^ try to add https://jitpack.io as a resolver => it should work than
kembles5
@kembles5
@pomadchin thanks for the response. Forgive me for the noob questions :). I'm a python developer and have very little understanding of what's happening behind the scenes with the RF zip file. What would I need to add to the command from the "usying pyspark shell" section to add a resolver?
jpolchlo
@jpolchlo
@kembles5 I think you need to add --repositories https://jitpack.io. From the command line, issue pyspark for more details.
kembles5
@kembles5

Hi @jpolchlo, thank you. i'm able to get to the pyspark shell now, but get the following error when i run: spark = spark.withRasterFrames(). From what I've read this looks like a scala version mismatch, but I verified I'm using spark 3.1 which uses scala 2.12

: java.lang.NoSuchMethodError: shapeless.DefaultSymbolicLabelling$.instance(Lshapeless/HList;)Lshapeless/DefaultSymbolicLabelling; at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68) at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)

jpolchlo
@jpolchlo
@kembles5 Yes, that looks like some kind of version mismatch, but the origin of those can be hard to track down. Are you setting up a very vanilla, minimal environment here, or are there other dependencies getting thrown in here? I haven't worked too much with pyrasterframes, so I'm possibly not extremely useful here. I wonder if @metasim has any advice?
kembles5
@kembles5
Thanks @jpolchlo. Looks like the latest version of spark is using shapeless 2.3.7 (per pom.xml from master branch) and the version that 0.10.1 depends on is 2.12. Is there another version of the rasterframes zip file (maybe a shaded version) that I should be using with spark 3. I'm using pyrasterframes_2.12-0.10.0-python.zip
Grigory
@pomadchin
@kembles5 @jpolchlo one option is to use assembly jar with shaded deps, another to upgrade shapeless in the spark classpath
I think we need to move to spark 3.2.x to get rid of this error and be fully compatible with everything
kembles5
@kembles5
@pomadchin I can't upgrade shapeless in the spark classpath and I can't move to spark 3.2. That leaves using the assembly jar with shaded deps. Where can I get this jar, i've searched high and low. I also tried to build the project but run into the following error(macos, openjdk18):
java.lang.ClassCastException: class java.lang.UnsupportedOperationException cannot be cast to class xsbti.FullReload (java.lang.UnsupportedOperationException is in module java.base of loader 'bootstrap'; xsbti.FullReload is in unnamed module of loader 'app')
sbt script version: 1.6.2
Grigory
@pomadchin
@kembles5 what OS are you on?
kembles5
@kembles5
@pomadchin macOS Monterey 12.2.1
That's my personal laptop, but at work and where I'm using spark, I'm on rhel7
kembles5
@kembles5
@pomadchin downgraded java to version 11 and sbt worked. Any instructions on how to build the assembly with shaded dependencies would be appreciated. Thanks for all your help
Grigory
@pomadchin
i think, the pip installed RF is already shaded
I would recommend you to follow RF quick start instructions
a good question how to deploy it on a cluster with no k8s, that’s a q to @metasim, but I’m pretty sure it is doable via conda
kembles5
@kembles5
@pomadchin Thanks for the info. The getting started guide is where I'm having issues. Specifically, the "Using pyspark shell" section. I tried following the guide using 0.10.0 and 0.10.1 on spark 3.1. Both initially failed with "modules not found", adding --repositories https://jitpack.io had no effect, although 0.10.1 eventually started working despite no changes, the 0.10.0 continues to have issues with finding the geotrellis module. For 0.10.1 i then ran into the shapeless conflict.