Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 04 18:09
    tieuthienvn1987 edited #481
  • Apr 04 18:09
    tieuthienvn1987 opened #481
  • Apr 01 17:05
    redblackeagleman starred locationtech/rasterframes
  • Mar 04 16:03

    metasim on develop

    Initial cut a maven->python ver… Propagate translated maven vers… Removed jar from pySparkCmd sin… and 2 more (compare)

  • Mar 04 16:03
    metasim closed #480
  • Mar 04 15:32
    metasim synchronize #480
  • Mar 04 15:12
    metasim commented #480
  • Mar 04 15:00
    metasim review_requested #480
  • Mar 04 15:00
    metasim review_requested #480
  • Mar 04 15:00
    metasim ready_for_review #480
  • Mar 04 15:00
    metasim edited #480
  • Mar 04 13:47
    nbuchanan starred locationtech/rasterframes
  • Mar 03 20:53
    metasim commented #459
  • Mar 03 20:51
    metasim opened #480
  • Mar 03 19:28

    metasim on develop

    Updated through override gdal b… (compare)

  • Feb 29 06:42
  • Feb 28 15:58

    vpipkt on develop

    Pin jupyter-client version for … Add deprecation package to requ… Correct pip install progress ba… and 6 more (compare)

  • Feb 28 15:58
    vpipkt closed #478
  • Feb 28 15:57
    vpipkt commented #478
  • Feb 28 15:01
    vpipkt labeled #479
Simeon H.K. Fitch
@metasim
I'm hoping Spark 3.0 makes progress with this. But until then, given my limited resources, I don't see a workaround.
@mjgolebiewski Actually, it looks like I did do some experimentation with converting tiles to arrays.
Michał Gołębiewski
@mjgolebiewski
@metasim thank you, i will try to base on your code to get my aggregations to work. i was also wondering if there is any method of getting square root of a tile?
Jason T Brown
@vpipkt
@mjgolebiewski see locationtech/rasterframes#460 about square root
we are thinking about it...
if the dataset is small you might consider a UDF to use numpy to do it .
and @mjgolebiewski i think we may have missed answering your question about kurtosis
in your snip above where is avg defined?
did you check to see the NaN behavior of just selecting the tile_kurtosis without wrapping in avg?
Michał Gołębiewski
@mjgolebiewski
@vpipkt i was trying to use avg aggregate function so i guess its just native spark method. other one i tried was collect_list method which, again, returned NaN only.
Jason T Brown
@vpipkt
I would guess it's pyspark.sql.functions.avg... anyways the next thing to do would be to check on the rf_cell_type of the tiles and make sure the nodata handling makes sense in your use case, and take a look at the data cell and no data cell counts in your stats ... do you see any data cells
@mjgolebiewski ... also, we merged #460
Michał Gołębiewski
@mjgolebiewski
@vpipkt thank you so much. i checked data cells and other statistics and they look all right, its the only problem with kurtosis
Michał Gołębiewski
@mjgolebiewski
hello guys, is there any way to use maprfs uri prefix to get data?
Jason T Brown
@vpipkt
i would guess so
i have not done it or seen it done
does maprfs implement the hadoop file system?
(that question may be imprecise)
Jason T Brown
@vpipkt
and seems like probably so. My guess is that you will need to make sure relevant JARs for maprfs are available to the app. From the python API you can do something like this,
from pyrasterframes.utils import *

spark = create_rf_spark_session(**{'spark.jars': f'{find_pyrasterframes_assembly()}, {maprfs_jar}'})
then if all is well it should be able to recognize these URLs
depending on the details you may have to do some more involved work building custom assembly JAR or tweaking some META-INF/services files in the assembly jar
Michał Gołębiewski
@mjgolebiewski
from pyrasterframes.utils import *

maprfs_jar = '/opt/mapr/lib/maprfs-6.1.0-mapr.jar'
spark = create_rf_spark_session(**{'spark.jars': f'{find_pyrasterframes_assembly()}, {maprfs_jar}'})
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.net.URISyntaxException: Illegal character in path at index 0:  /opt/mapr/lib/maprfs-6.1.0-mapr.jar
    at java.net.URI$Parser.fail(URI.java:2848)
    at java.net.URI$Parser.checkChars(URI.java:3021)
    at java.net.URI$Parser.parseHierarchical(URI.java:3105)
    at java.net.URI$Parser.parse(URI.java:3063)
    at java.net.URI.<init>(URI.java:588)
    at org.apache.spark.SparkContext.addJar(SparkContext.scala:1859)
    at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:458)
    at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:458)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
thank you for your answer. i am not sure what error above means
Jason T Brown
@vpipkt
maybe not have the jars separated by spaces in the config value?
Michał Gołębiewski
@mjgolebiewski
oof, i will check it
Jason T Brown
@vpipkt
**{'spark.jars': f'{find_pyrasterframes_assembly()},{maprfs_jar}'}
?
just cause it said index 0, maybe relative to the string which would be /opt/...
uh there is suppose to be a space in there
Michał Gołębiewski
@mjgolebiewski
**{'spark.jars': f'{find_pyrasterframes_assembly()}, <space> {maprfs_jar}'}
here?
i am kind of confused now
Jason T Brown
@vpipkt
ah i think it should omit the space
Michał Gołębiewski
@mjgolebiewski
okay, i think i got it - its added as resource in SparkUI - but when i try to use prefix i still get error:
Py4JJavaError: An error occurred while calling o240.parquet.
: java.io.IOException: No FileSystem for scheme: maprfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:547)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:355)
    at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:645)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Jason T Brown
@vpipkt
progress!
Jason T Brown
@vpipkt
yes i believe this is maybe related to the META-INF/services
inside your maprfs jar is there a file in that path?
Michał Gołębiewski
@mjgolebiewski
Manifest-Version: 1.0
Implementation-Title: MapR FileSystem Client
Implementation-Vendor-Id: com.mapr.hadoop
this was inside MANIFEST.MF
are we looking for something in particular inside maprfs jar? would be helpful to know
Simeon H.K. Fitch
@metasim
@mjgolebiewski If that jar is indeed in the runtime classpath, and has the appropriate file in the META-INF/services directory inside it, then it is supposed to be picked up when the ServiceLoader requests a handler for the filesystem type.
I would confirm that the jar is making it into the executors' classpath. A quick check would be to attempt to load some class inside the mapr jar inside a RDD.map call and confirm that it completes.
Then take a look at the "Environment" page of the Spark UI and make sure it shows up in the classpath listed there.
Michał Gołębiewski
@mjgolebiewski
obraz.png
is that correct?
Simeon H.K. Fitch
@metasim
@mjgolebiewski That's what I'd expect.
Michał Gołębiewski
@mjgolebiewski
okay, so i got this jar loaded - but i still get the error. could you provide me with classes that rasterframes uses to get uris? i am not sure how to proceed next, so if you have any tips what should i do id be grateful
Simeon H.K. Fitch
@metasim
Can you provide a representative/redacted URI?... so I can make sure I understand what you're sending to the reader? Maybe the scheme is throwing something off.
Or a small chunk of code to test with. I have to leave for the day, but can look at this tomorrow.
Michał Gołębiewski
@mjgolebiewski
i tried to read test parquet file:
spark.read.parquet(f"maprfs://dp-01/volumes/home/dpuser/testpq")
Simeon H.K. Fitch
@metasim
@mjgolebiewski Have you tried reading that file without RasterFrames in the mix? A generic spark session with just the mapr jar?
Simeon H.K. Fitch
@metasim
Since you're reading parquet, I'd not expect RasterFrames to affect the driver resolution process.