Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 15:29
    PPF12138 starred locationtech/rasterframes
  • Jan 30 11:22
  • Jan 26 17:04
    thomas-maschler synchronize #597
  • Jan 26 13:21
    thomas-maschler synchronize #597
  • Jan 24 19:28
    thomas-maschler synchronize #597
  • Jan 24 19:08
    thomas-maschler synchronize #597
  • Jan 24 19:00
    thomas-maschler synchronize #597
  • Jan 24 18:55
    thomas-maschler synchronize #597
  • Jan 24 18:41
    thomas-maschler synchronize #597
  • Jan 24 18:41
    thomas-maschler synchronize #597
  • Jan 24 04:47
    thomas-maschler synchronize #597
  • Jan 22 15:40
    mohamadyounes starred locationtech/rasterframes
  • Jan 21 04:43
    thomas-maschler synchronize #597
  • Jan 21 04:30
    thomas-maschler synchronize #597
  • Jan 21 04:25
    thomas-maschler synchronize #597
  • Jan 21 03:50
    thomas-maschler synchronize #597
  • Jan 21 03:39
    thomas-maschler synchronize #597
  • Jan 21 03:06
    thomas-maschler synchronize #597
  • Jan 21 01:59
    pomadchin commented #597
  • Jan 21 01:59
    pomadchin commented #597
yang162132
@yang162132

how does rasterframelayer work in spark ml ? I found it even cost more time in

    val explodedata = df.select($"spatial_key", rf_explode_tiles($"B1", $"B2",$"B3", $"B4"))
    val valuedata = explodedata.na.drop("any")
    val sampledata = valuedata.sample(0.3).limit(100)
    val assembler = new VectorAssembler()
      .setInputCols(Array("B1","B2","B3","B4"))
      .setOutputCol("features")
      .setHandleInvalid("skip")
    val k = 3
    val km = new KMeans()
      .setK(k)
      .setFeaturesCol("features")
    val pipeline1 = new Pipeline().setStages(Array( assembler, km))
    val model1 = pipeline.fit(sampledata )

than

    val texp = new TileExploder()
    val pipeline2 = new Pipeline().setStages(Array( texp,assembler, km))
    val model2 = pipeline.fit(df)

the model1 only need fit 100 point but it takes more time than the model2 which need to fit the whole tiff with 7751*7891 point

3 replies
Yuri de Abreu
@yurigba

Hey guys,

I remember that it was a problem to use apache arrow in Rasterframes due to the lack of multidimensional array support. Was this solved in spark 3.1.2 ?

3 replies
yang162132
@yang162132

I use this way to load series geotrellis cataloges,

    val bandarray = Array("B1", "B2", "B3", "B4")
    val zoom = 12

    val catalogddata: Array[RasterFrameLayer] = bandarray.map(value => {
      spark.read.geotrellis.loadLayer(catalogdir, LayerId(value, zoom)).asLayer.withRFColumnRenamed("tile_1", value)
    })
    val datalenth = catalogddata.length
    var df = catalogddata(0)
    for (i <- 1 to datalenth - 1) {
      df = df.spatialJoin(catalogddata(i)).drop(catalogddata(i)("geometry")).asLayer
    }

Does it hava a more elegant way to load multiple geotrellis catalog layer?

Simeon H.K. Fitch
@metasim
@yang162132 Have you tried rasterJoin?
If both layers have the same grid, you could just join on SpatialKey.
Yuri de Abreu
@yurigba

Hey guys,

I am using pyrasterframes for Machine Learning and I'd like to know how can we use something like TileExploder and NodataFilter but instead of getting all pixels we get a sample of N pixels from each row.

Is there any direct way of doing this?

4 replies
evanatomicmaps
@evanatomicmaps

I posted a question in geotrellis but was told this might be a better place for it. I've got a scala dataframe raster catalog, with DEM raster tiles loaded thusly -

val rf = spark.read.raster.fromCatalog(df2, "rast").load()
          .withColumn("geom", st_geometry(rf_extent(col("rast")))).withColumn("cent", st_centroid(st_geometry(rf_extent(col("rast")))))

im trying to calculate slope on parcel data that intersects these tiles, but im lost on passing a tile to the slope function.
I was able to create a doubleArrayTile and test things.

Simeon H.K. Fitch
@metasim
@evanatomicmaps Are you using the new (but undocumented) slope functions?
@pomadchin Had a notebook floating around showing an example...
Grigory
@pomadchin
@metasim @evanatomicmaps uses the old RF….
evanatomicmaps
@evanatomicmaps
@metasim @pomadchin yeah im still on .8.4. i am attempting to write a pyspark UDF to hit this in geotrellis -
https://github.com/locationtech/geotrellis/blob/master/raster/src/main/scala/geotrellis/raster/mapalgebra/focal/Slope.scala
to learn/test, we created a doublearraytile orig question which was succesful
ive got the raster catalog above; i cant figure out how to pass tiles to the function/udf.
Simeon H.K. Fitch
@metasim
Simeon H.K. Fitch
@metasim
Last time I released was with Python 3.7. Conda dropped 3.7, so now I'm trying 3.8. This is the error I get when running sbt package twice, something I used to be able to do fine (sometimes it gets run twice when building docs, etc. because change tracking isn't super refined). I'm wondering if there are any Python experts that might know why this started happening (it also happens with 3.9):
[info] Running 'python setup.py build bdist_wheel' in '/Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/python'
/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
  warnings.warn(
setup.py sees the version as 0.10.1.dev
Traceback (most recent call last):
  File "setup.py", line 174, in <module>
    setup(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 154, in setup
    _install_setup_requires(attrs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 148, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/dist.py", line 826, in fetch_build_eggs
    resolved_dists = pkg_resources.working_set.resolve(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 777, in resolve
    dist = best[req.key] = env.best_match(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1062, in best_match
    return self.obtain(req, installer)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1074, in obtain
    return installer(requirement)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/dist.py", line 921, in fetch_build_egg
    return fetch_build_egg(self, req)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/installer.py", line 87, in fetch_build_egg
    wheel.install_as_egg(dist_location)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 95, in install_as_egg
    self._install_as_egg(destination_eggdir, zf)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 103, in _install_as_egg
    self._convert_metadata(zf, destination_eggdir, dist_info, egg_info)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 124, in _convert_metadata
    os.mkdir(destination_eggdir)
FileExistsError: [Errno 17] File exists: '/Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/python/.eggs/rasterio-1.2.10-py3.8-macosx-11.0-arm64.egg'
Grigory
@pomadchin
File exists: oO
@metasim don’t you have VMs anywhere; do you have enough space?
Simeon H.K. Fitch
@metasim
I have plenty of space. If the files exists, it should skip installing it. Literally nothing has changed in the build definition since the last release. This is one of those mysterious changes in the python universe.
It's a serious impediment because I'm having to manually delete the .egg dir between the package and test stages, because python setup.py test want's to reinstall stuff again.
1 reply
If I could throw code out the window, I would right now. setuptools is a complete abomination.
Repeatability is the first principle in package and dependency management.
Every time I try to do a little work on RF with my limited free time, I end up spending all of it fighting Python.
Simeon H.K. Fitch
@metasim
The ecosystem is fundamentally flawed.
If there are any Python experts out there in the community, I sure could use your help.
Simeon H.K. Fitch
@metasim
:balloon: :balloon: RasterFrames 0.10.1 is Released! :balloon: :balloon:
Release notes: https://github.com/locationtech/rasterframes/releases/tag/0.10.1
Artifacts deployed to Maven Central and PyPi.
Documentation has not been updated :disappointed:
Manrique Vargas
@mv1742
image.png

Hello, i installed pyrasterframes version 0.8.5 (compatible with spark 3..) in Databricks (runtime 6..) and i was able to read from modis-pds.s3.amazonaws.com tif file successfully, e.g.
df = spark.read.raster('https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF') https://rasterframes.io/getting-started.html

However i am not able to load other single raster files in other locations or my own .tif files in dbfs or in s3 buckets. For example when reading from this single rasterframe from a public s3 bucket (see https://rasterframes.io/raster-read.html), using commandrf = spark.read.raster('https://rasterframes.s3.amazonaws.com/samples/luray_snp/B02.tif'), i get the error below:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 61 in stage 287.0 failed 4 times, most recent failure: Lost task 61.3 in stage 287.0 (TID 5589, 10.130.180.94, executor 0): java.lang.IllegalArgumentException: Error fetching data for one of: JVMGeoTiffRasterSource(https://rasterframes.s3.amazonaws.com/samples/luray_snp/B02.tif)

Similarly when reading from dbfs I get the error below:

Caused by: java.lang.UnsupportedOperationException: Reading 'dbfs:/tmp/rasterframes/raster_problem/it_template_no0.tif' not supported

any ideas how to load a single rasterframe in databricks using pyrasterframes?

3 replies
Henry Rodman
@hrodmn

Hi all, I am attempting to send a STAC query via the spark.read.stacapi on v0.10.1 and the filters that I am sending do not seem to be applied to the resulting dataframe (the rows show bounding boxes that do not overlap with my supplied bounding box). Do you have any suggestions for applying the filters in a different way?

Here is what I tried:

from pyrasterframes.utils import create_rf_spark_session

bbox = [-92.2646, 46.6930, -92.0276, 46.9739]
uri = 'https://earth-search.aws.element84.com/v0'
query_params = {
    'collections': ['sentinel-s2-l2a-cogs'],
    'datetime': '2021-06-01/2021-06-30',
    'bbox': bbox,
}

df = spark.read.stacapi(uri, filters=query_params)
df.select(df.id, df.bbox).limit(5)

When I run the same query using pystac_client I get 24 images back:

import pystac_client
catalog = pystac_client.Client.open(uri)
all_items = catalog.search(**query_params).get_all_items()
len(all_items)

I found the stacapi read method on GitHub but am not sure the next place to look.

Grigory
@pomadchin
hey @hrodmn let me have a look
@hrodmn do you have a full pystac example?
Henry Rodman
@hrodmn
@pomadchin here is the full pystac example:
import pystac_client

uri = 'https://earth-search.aws.element84.com/v0'
query_params = {
    'collections': ['sentinel-s2-l2a-cogs'],
    'datetime': '2021-06-01/2021-06-30',
    'bbox': [-92.2646, 46.6930, -92.0276, 46.9739],
}

catalog = pystac_client.Client.open(uri)

all_items = catalog.search(**query_params).get_all_items()

# to check on the results
for item in items:
    print(item)
Grigory
@pomadchin
@hrodmn :+1: will look in a bit
Grigory
@pomadchin
@hrodmn 1. the datetime is in the incorrect format should be ISO, 2. there is indeed some weird bug with bbox progpagating? I need to clarify that
thx for reporting
try to use the polygon intersection for now
however, Im really surprised by the output
Grigory
@pomadchin
@hrodmn there is a bug in pagination, it doesn’t quite work with https://earth-search.aws.element84.com/v0
as a workaround try smth like
{
    "collections": ["sentinel-s2-l2a-cogs"],
    "datetime": "2021-06-01T19:09:23.735395Z/2021-06-30T19:09:23.735395Z",
    "bbox": [-92.2646, 46.6930, -92.0276, 46.9739],
    "limit": 30 // increasing limit to avoid pagination
}
Grigory
@pomadchin
oh, it is not a bug, we don’t support the pagination format used by https://earth-search.aws.element84.com/v0
Henry Rodman
@hrodmn
Thank you @pomadchin! Your filter parameters yields the expected result.
Grigory
@pomadchin
@hrodmn I filed an issue here azavea/stac4s#495 but honestly don’t know when we’ll have time to address it
Grigory
@pomadchin
@hrodmn I fixed the behvaior, turned out to be not that complicated. in the next RF release it would be fixed :tada:
evanatomicmaps
@evanatomicmaps
can someone tell me if/how I can pass a rasterframes masked tile to rasterio.features.shapes to get geometries of cells with data?
Adrian Klink
@aklink
Off-Topic question: Has anyone ever tested combining rasterframes with horovod?
https://horovod.readthedocs.io/en/stable/spark_include.html
1 reply
wxmimperio
@imperio-wxm
How to customize the datasource, is there any documentation or instructions for reference?
Grigory
@pomadchin
hey @imperio-wxm what datasource you want to cusomize? but in general there is no, I don’t think spark has official intros / docs related to the DataSources API
1 reply
DonjetaR
@DonjetaR
Hi, I am interested in working with the Rasterframe version >0.10 because of the Spark 3 support. However, I see that the code is on the "develope" branch on GitHub. Do you have a plan for when it will be merget to main or released on an official stable version? Is the Rasterframe 0.10.1 version a stable release?
Grigory
@pomadchin
Hey @DonjetaR it is a stable release
wxmimperio
@imperio-wxm
I run STAC API spark reader Test get error, branch develop:

28499
java.lang.ArrayIndexOutOfBoundsException: 28499
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:532)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:315)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:102)
    at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:76)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:45)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:59)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:59)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:59)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:181)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:175)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:174)
    at scala.collection.immutable.List.flatMap(List.scala:366)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:174)
    .......
    .......
    at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
    at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3241)
    at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3239)
    at org.locationtech.rasterframes.datasource.stac.api.StacApiDataSourceTest.$anonfun$new$4(StacApiDataSourceTest.scala:67)
Grigory
@pomadchin
hey @imperio-wxm how do you run tests?
is it via intelij IDEA?