Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 14:25
    vpipkt commented #450
  • 14:20
    vpipkt commented #449
  • 14:09
    metasim commented #453
  • Jan 20 22:02
    vpipkt synchronize #449
  • Jan 20 19:20
    vpipkt ready_for_review #449
  • Jan 20 19:19
    vpipkt synchronize #449
  • Jan 20 17:57
    vpipkt review_requested #453
  • Jan 20 17:57
    vpipkt commented #453
  • Jan 20 17:52
    vpipkt opened #453
  • Jan 20 17:15
    vpipkt commented #452
  • Jan 20 17:14
    vpipkt edited #452
  • Jan 20 17:13
    vpipkt edited #452
  • Jan 20 17:01
    vpipkt commented #452
  • Jan 20 16:51
    vpipkt commented #452
  • Jan 20 16:50
    vpipkt commented #452
  • Jan 20 15:55
    metasim commented #452
  • Jan 20 15:52
    vpipkt commented #452
  • Jan 20 15:36
    metasim commented #452
  • Jan 20 15:35
    vpipkt commented #452
  • Jan 20 15:33
    vpipkt commented #452
Simeon H.K. Fitch
@tosen1990 Yes, sure it works with 0.8.4, but assumes test environment where images can be loaded from the classpath. Try replacing the href function with this:
  def href(name: String) =  "https://raw.githubusercontent.com/locationtech/rasterframes/develop/core/src/test/resources/" + name
Hold up... I think I see an issue...
Simeon H.K. Fitch
@tosen1990 Here's a fixed version:
Here's what I did to test it:
docker run -p 8888:8888 s22s/rasterframes-notebook:0.8.4
open http://localhost:8888
# From jupyter in browser, open a terminal session for the following:
wget https://gist.githubusercontent.com/metasim/32795c419c60f9b9e2ace539ba44eaeb/raw/b2a71a70b6dfc65190a197a35b1904aac5a07e1e/ClassificationRasterSource.scala
unzip /usr/local/rasterframes/pyrasterframes-0.8.4-py3-none-any.whl
/usr/local/spark/bin/spark-shell --jars ./pyrasterframes/jars/pyrasterframes-assembly-0.8.4.jar 
# From scala REPL:
:load ClassificationRasterSource.scala
# After job finishes, open "classified.png"
Apologies for the messed up example before.
If you need the other example updated let me know, but I might have to wait until tomorrow to do it.
@vpipkt Thanks for your hint,Json. It works now.

hey,metasim.Finally the classification program works fine.
Then I'm wondering why you deleted the Cross-Validat part in your previous version.
It seems like it doesn't matter much.
Also I add the evaluatation in your newer version:

  // Configure how we're going to evaluate our model's performance.
  val evaluator = new MulticlassClassificationEvaluator()

  // Push the "go" button
  val model = pipeline.fit(abt)

  // Score the original data set, including cells
  // without target values.
  val prediction_df  = model.transform(abt)
  val accuracy: Double = evaluator.evaluate(prediction_df)
  println("accuracy: " + accuracy )

The accuracy is 1.0,different from the one in python version in RasterFrames document.
I guess what results in this is because I don't mask the cloud pixels.

Jason T Brown
@tosen1990, @metasim and I were pair programming on it some. I think the cross validation was resulting in folds having all the label values null. This was causing an error in the fitting.
So the choice was to reduce complexity in the example. Alternative would be to filter for nulls in the dataframe, before passing to fit.
@tosen1990 As far as the evaluation, your evaluator is using the f1 metric, try accuracy for a direct comparison of the two results.
Another note, the python doc page uses a different set of data from the latest scala gist
So certainly expect different results in that case
@vpipkt Got it.Thanks for you clear explanation.
Jason T Brown
You're very welcome!
Would anyone add the geojson rasterize part in classification program(scala version) like in the python version?
It takes me abt 1 day to do it,but still havn't made any progress.
  val test: DataFrame = spark.read.geojson.load("L8-Labels-Elkton-VA.geojson")
  val label_df: DataFrame = test
    .select($"id", st_reproject(rf_geometry($"geometry"),LatLng,LatLng).alias("geometry"))
  val df_joined = abt.join(label_df, st_intersects(st_geometry($"extent"), $"geometry"))
  val df_labeled: DataFrame = df_joined.withColumn("label",
    rf_rasterize($"geometry", st_geometry($"extent"), $"id", $"dims.cols", $"dims.rows")
1: spark.read.geojson.load throws the exception:
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:262)
    at org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:52)
    at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(WholeTextFileRDD.scala:54)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:44)
    at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1304)
    at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1304)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1303)
    at org.locationtech.rasterframes.datasource.geojson.GeoJsonDataSource$GeoJsonRelation.<init>(GeoJsonDataSource.scala:77)
    at org.locationtech.rasterframes.datasource.geojson.GeoJsonDataSource.createRelation(GeoJsonDataSource.scala:55)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.air.ebds.organize.algorithm.classification.ClassificationWithGeoJson$.delayedEndpoint$org$air$ebds$organize$algorithm$classification$ClassificationWithGeoJson$1(ClassificationWithGeoJson.scala:54)
    at org.air.ebds.organize.algorithm.classification.ClassificationWithGeoJson$delayedInit$body.apply(ClassificationWithGeoJson.scala:22)
    at sc
2: val crses : Array[Row] = abt.select("crs.crsProj4").distinct().collect()
How to put into the right crs instance here?
Jason T Brown
As for item 2, you should use function rf_crs($"proj_raster") and the typing should be a CRS object that you can then pass into st_reproject
don't forget to change your type annotation on val crses: Array[CRS]
you can then pass crses.head into st_reproject.
Just a guess here about number 1 read.geojson ... there may be some guava library conflict in the assembly JAR. Is this running on a spark local master ?
Yeap.Just run it with spark local mode.
Jason T Brown
Ya may be some build dependency problems with Hadoop versions and guava dependencies https://stackoverflow.com/questions/36427291/illegalaccesserror-to-guavas-stopwatch-from-org-apache-hadoop-mapreduce-lib-inp
i don't really have the expertise to do the surgery on the build system to manage the guava versions, but that would seem to be what is needed
Simeon H.K. Fitch
I think it goes away with hadoop 2.8
Still not fix that now,I'll work on it tmr.
@vpipkt @metasim Luckily,I fix this by case PathList("com", "google", xs@_*) => MergeStrategy.first in assembly.sbt.
Then the guavadep version in my build project is 15.0.
Thanks for your help. I have to admit it's too hard to master the sbt.
Simeon H.K. Fitch
@tosen1990 Would you be willing to file a ticket with that info, and we can get it into an upcoming release?
@metasim I'd like to do that tomorrow. Time to hit the sack.
Simeon H.K. Fitch
Thanks for working through this all
Simeon H.K. Fitch
:balloon: :balloon: RasterFrames 0.8.5 is Released! :balloon: :balloon:
Several nice goodies in this one: https://rasterframes.io/release-notes.html#0-8-5
Artifacts released to Maven Central, PyPi, & Docker Hub.
Jason T Brown
spatial indexing in the raster reader is a big one. In Python API look at the help(spark.read.raster) for more
Michał Gołębiewski

hi guys, im trying to update rasterframes to new version via docker image but i get some errors:

Exception                                 Traceback (most recent call last)
<ipython-input-1-d7e73cfb9fde> in <module>
---> 30 spark = create_rf_spark_session()

/opt/conda/lib/python3.7/site-packages/pyrasterframes/utils.py in create_rf_spark_session(master, **kwargs)
     93              .config('spark.jars', jar_path)
     94              .withKryoSerialization()
---> 95              .config(conf=conf)  # user can override the defaults
     96              .getOrCreate())

/usr/local/spark/python/pyspark/sql/session.py in getOrCreate(self)
    171                     for key, value in self._options.items():
    172                         sparkConf.set(key, value)
--> 173                     sc = SparkContext.getOrCreate(sparkConf)
    174                     # This SparkContext may be an existing one.
    175                     for key, value in self._options.items():

/usr/local/spark/python/pyspark/context.py in getOrCreate(cls, conf)
    365         with SparkContext._lock:
    366             if SparkContext._active_spark_context is None:
--> 367                 SparkContext(conf=conf or SparkConf())
    368             return SparkContext._active_spark_context

/usr/local/spark/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    131                     " note this option will be removed in Spark 3.0")
--> 133         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    134         try:
    135             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
    314         with SparkContext._lock:
    315             if not SparkContext._gateway:
--> 316                 SparkContext._gateway = gateway or launch_gateway(conf)
    317                 SparkContext._jvm = SparkContext._gateway.jvm

/usr/local/spark/python/pyspark/java_gateway.py in launch_gateway(conf)
     44     :return: a JVM gateway
     45     """
---> 46     return _launch_gateway(conf)

/usr/local/spark/python/pyspark/java_gateway.py in _launch_gateway(conf, insecure)
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise Exception("Java gateway process exited before sending its port number")
    110             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

this error seems to happen when i try

from IPython.display import display
import pyrasterframes.rf_ipython
from shapely.geometry import MultiPolygon
import pandas as pd
import geopandas, os, re, json, numpy, glob
from pyrasterframes.utils import create_rf_spark_session
from pyrasterframes.rasterfunctions import *
from pyspark.sql.functions import *
from zipfile import ZipFile
from pyspark import SparkFiles
from pyspark.sql.functions import col
from pyspark.sql import functions as F
from pyspark.sql.types import DateType, TimestampType
from functools import reduce
from pyspark.sql import DataFrame
from osgeo import osr, gdal
import xml.etree.ElementTree as et
from xml.dom import minidom

spark = create_rf_spark_session()
Jason T Brown
and this is with version 0.8.5? To be clear, are you using the RF notebook image?
Jason T Brown
@mjgolebiewski I tried this again with creating the spark session using a much smaller set of imports and it did successfully work
I will have a look and see if I can figure out where the conflict is.
Michał Gołębiewski
@vpipkt hi, i am using latest image:
s22s/rasterframes-notebook latest cbc6ce228c8e 2 days ago 5.23GB
Jason T Brown
ok that is the same as 0.8.5
so when i comment out the line # import geopandas, os, re, json, numpy, glob it is able to create the spark session
unsure why but investigating further
yoou may try creating the spark session first then doing that import and see if it works
@mjgolebiewski it seems to be the import geopandas but I don't know what the root cause is. I'll have a look
Michał Gołębiewski
@vpipkt thank you so much, its working now :smile_cat:
Jason T Brown
that is a real stumper... see issue #452
Simeon H.K. Fitch
:balloon: :balloon: RasterFrames 0.9.0-RC2 is Released! :balloon: :balloon:
Updated synchronizes the 0.9.0 release with 0.8.5 features, and upgrades GeoTrellis to 3.2.0.
@mjgolebiewski I'd suggest using s22s/rasterframes-notebook:0.8.5 instead of latest, as it's not a stable reference. (It's now pointing to 0.9.0-RC2).
Jason T Brown
@mjgolebiewski fix is in for the issue #452 ... you should be able to work around this if you create the spark session before importing geopandas or rtree. The version of rtree in the container has some bad code for resolving library locations that incorrectly changes the PATH environment variable that breaks pyspark