how does rasterframelayer work in spark ml ? I found it even cost more time in
val explodedata = df.select($"spatial_key", rf_explode_tiles($"B1", $"B2",$"B3", $"B4"))
val valuedata = explodedata.na.drop("any")
val sampledata = valuedata.sample(0.3).limit(100)
val assembler = new VectorAssembler()
.setInputCols(Array("B1","B2","B3","B4"))
.setOutputCol("features")
.setHandleInvalid("skip")
val k = 3
val km = new KMeans()
.setK(k)
.setFeaturesCol("features")
val pipeline1 = new Pipeline().setStages(Array( assembler, km))
val model1 = pipeline.fit(sampledata )
than
val texp = new TileExploder()
val pipeline2 = new Pipeline().setStages(Array( texp,assembler, km))
val model2 = pipeline.fit(df)
the model1 only need fit 100 point but it takes more time than the model2 which need to fit the whole tiff with 7751*7891 point
I use this way to load series geotrellis cataloges,
val bandarray = Array("B1", "B2", "B3", "B4")
val zoom = 12
val catalogddata: Array[RasterFrameLayer] = bandarray.map(value => {
spark.read.geotrellis.loadLayer(catalogdir, LayerId(value, zoom)).asLayer.withRFColumnRenamed("tile_1", value)
})
val datalenth = catalogddata.length
var df = catalogddata(0)
for (i <- 1 to datalenth - 1) {
df = df.spatialJoin(catalogddata(i)).drop(catalogddata(i)("geometry")).asLayer
}
Does it hava a more elegant way to load multiple geotrellis catalog layer?
SpatialKey
.
I posted a question in geotrellis but was told this might be a better place for it. I've got a scala dataframe raster catalog, with DEM raster tiles loaded thusly -
val rf = spark.read.raster.fromCatalog(df2, "rast").load()
.withColumn("geom", st_geometry(rf_extent(col("rast")))).withColumn("cent", st_centroid(st_geometry(rf_extent(col("rast")))))
im trying to calculate slope on parcel data that intersects these tiles, but im lost on passing a tile to the slope function.
I was able to create a doubleArrayTile and test things.
sbt package
twice, something I used to be able to do fine (sometimes it gets run twice when building docs, etc. because change tracking isn't super refined). I'm wondering if there are any Python experts that might know why this started happening (it also happens with 3.9):[info] Running 'python setup.py build bdist_wheel' in '/Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/python'
/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
setup.py sees the version as 0.10.1.dev
Traceback (most recent call last):
File "setup.py", line 174, in <module>
setup(
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 154, in setup
_install_setup_requires(attrs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 148, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/dist.py", line 826, in fetch_build_eggs
resolved_dists = pkg_resources.working_set.resolve(
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 777, in resolve
dist = best[req.key] = env.best_match(
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1062, in best_match
return self.obtain(req, installer)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1074, in obtain
return installer(requirement)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/dist.py", line 921, in fetch_build_egg
return fetch_build_egg(self, req)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/installer.py", line 87, in fetch_build_egg
wheel.install_as_egg(dist_location)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 95, in install_as_egg
self._install_as_egg(destination_eggdir, zf)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 103, in _install_as_egg
self._convert_metadata(zf, destination_eggdir, dist_info, egg_info)
File "/opt/homebrew/Caskroom/miniconda/base/envs/rf-3.8/lib/python3.8/site-packages/setuptools/wheel.py", line 124, in _convert_metadata
os.mkdir(destination_eggdir)
FileExistsError: [Errno 17] File exists: '/Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/python/.eggs/rasterio-1.2.10-py3.8-macosx-11.0-arm64.egg'
.egg
dir between the package
and test
stages, because python setup.py test
want's to reinstall stuff again.
Hello, i installed pyrasterframes version 0.8.5 (compatible with spark 3..) in Databricks (runtime 6..) and i was able to read from modis-pds.s3.amazonaws.com tif file successfully, e.g.df = spark.read.raster('https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/MCD43A4.A2019059.h11v08.006.2019072203257_B02.TIF') https://rasterframes.io/getting-started.html
However i am not able to load other single raster files in other locations or my own .tif files in dbfs or in s3 buckets. For example when reading from this single rasterframe from a public s3 bucket (see https://rasterframes.io/raster-read.html), using commandrf = spark.read.raster('https://rasterframes.s3.amazonaws.com/samples/luray_snp/B02.tif')
, i get the error below:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 61 in stage 287.0 failed 4 times, most recent failure: Lost task 61.3 in stage 287.0 (TID 5589, 10.130.180.94, executor 0): java.lang.IllegalArgumentException: Error fetching data for one of: JVMGeoTiffRasterSource(https://rasterframes.s3.amazonaws.com/samples/luray_snp/B02.tif)
Similarly when reading from dbfs I get the error below:
Caused by: java.lang.UnsupportedOperationException: Reading 'dbfs:/tmp/rasterframes/raster_problem/it_template_no0.tif' not supported
any ideas how to load a single rasterframe in databricks using pyrasterframes?
Hi all, I am attempting to send a STAC query via the spark.read.stacapi
on v0.10.1 and the filters that I am sending do not seem to be applied to the resulting dataframe (the rows show bounding boxes that do not overlap with my supplied bounding box). Do you have any suggestions for applying the filters in a different way?
Here is what I tried:
from pyrasterframes.utils import create_rf_spark_session
bbox = [-92.2646, 46.6930, -92.0276, 46.9739]
uri = 'https://earth-search.aws.element84.com/v0'
query_params = {
'collections': ['sentinel-s2-l2a-cogs'],
'datetime': '2021-06-01/2021-06-30',
'bbox': bbox,
}
df = spark.read.stacapi(uri, filters=query_params)
df.select(df.id, df.bbox).limit(5)
When I run the same query using pystac_client
I get 24 images back:
import pystac_client
catalog = pystac_client.Client.open(uri)
all_items = catalog.search(**query_params).get_all_items()
len(all_items)
I found the stacapi
read method on GitHub but am not sure the next place to look.
import pystac_client
uri = 'https://earth-search.aws.element84.com/v0'
query_params = {
'collections': ['sentinel-s2-l2a-cogs'],
'datetime': '2021-06-01/2021-06-30',
'bbox': [-92.2646, 46.6930, -92.0276, 46.9739],
}
catalog = pystac_client.Client.open(uri)
all_items = catalog.search(**query_params).get_all_items()
# to check on the results
for item in items:
print(item)
{
"collections": ["sentinel-s2-l2a-cogs"],
"datetime": "2021-06-01T19:09:23.735395Z/2021-06-30T19:09:23.735395Z",
"bbox": [-92.2646, 46.6930, -92.0276, 46.9739],
"limit": 30 // increasing limit to avoid pagination
}
28499
java.lang.ArrayIndexOutOfBoundsException: 28499
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:532)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:315)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:102)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:76)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:45)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:59)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:59)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:59)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:181)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:175)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:174)
at scala.collection.immutable.List.flatMap(List.scala:366)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:174)
.......
.......
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3241)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3239)
at org.locationtech.rasterframes.datasource.stac.api.StacApiDataSourceTest.$anonfun$new$4(StacApiDataSourceTest.scala:67)