Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 12:00

    pomadchin on master

    Cleanup build files (#3386) (compare)

  • 12:00
    pomadchin closed #3386
  • 01:29
    pomadchin opened #3386
  • Apr 20 15:35

    pomadchin on npm_and_yarn

    (compare)

  • Apr 20 15:35

    pomadchin on master

    Bump ssri from 6.0.1 to 6.0.2 i… (compare)

  • Apr 20 15:35
    pomadchin closed #3385
  • Apr 20 14:43
    pomadchin unlabeled #3294
  • Apr 20 14:43
    pomadchin unlabeled #3357
  • Apr 20 02:38
    dependabot[bot] labeled #3385
  • Apr 20 02:38
    dependabot[bot] opened #3385
  • Apr 20 02:38

    dependabot[bot] on npm_and_yarn

    Bump ssri from 6.0.1 to 6.0.2 i… (compare)

  • Apr 14 02:17

    pomadchin on master

    Update CHANGELOG.md (compare)

  • Apr 13 22:26

    pomadchin on master

    Cleanup HBase dependencies (#33… (compare)

  • Apr 13 22:26
    pomadchin closed #3384
  • Apr 13 22:26
    pomadchin commented #3384
  • Apr 13 22:23
    pomadchin commented #3384
  • Apr 13 22:09
    pomadchin synchronize #3384
  • Apr 13 22:02
    pomadchin opened #3384
  • Apr 13 17:15
    pomadchin commented #3301
  • Apr 13 17:14
    pomadchin edited #3301
Grigory
@pomadchin
That is the parallelism per executor, which is used in a pool that handles reading keys from S3, it is set here
S3LayerReader fetches keys from S3 using the dedicated blocking thread pool (with the size configured thgouht the configuration option)
so it would make sense to increase its size if you experience IO problems
StrongSteve
@StrongSteve
will give it a try, thx!
Grigory
@pomadchin
:+1:
StrongSteve
@StrongSteve

so blocking-thread-pool.threads seems to help transfer stuff from S3 faster
if i understood it correctly it is used inside BlockingThreadPool which is used in a variety of S3Layer readers and writers

so simple example
having an executor with 4 cores and setting geotrellis.blocking-thread-pool.threads to 20 means
reading/writing to/from S3 is done with 20 threads in parallel but mapping steps are done with 3 parallel threads
that's why i see the 16 geotrellis threads as waiting in a map step

did i get it right?

Grigory
@pomadchin
yea that is right

and the reason why it makes sense in your case to allocate more threads is because you have lots of unused CPUs, you have a nice bandwidth, but the internet connection is pretty limited

so you can slowly fetch more data in parallel

StrongSteve
@StrongSteve
and to clarify the last grey spot ;) - having more threads during read means to have more RAM available, right?
how is this handled in spark - just curious. because i still see the task running with - lets stick to the example above, with 4 running task, but making a threaddump of the executor i see 20 geotrellis-io threads running
Grigory
@pomadchin
@StrongSteve not really, imagine you have a partition of length 100 that contains keys
it doesnt really matter much in terms of ram would you load all keys into 256x256 tiles at once or sequentially
each task can process 1 partition at a time
within a single task we can make a parallel partition unfolding (that’s what we use the FixedThreadPool for)
StrongSteve
@StrongSteve
and a partition consists of n tiles and having more threads to work on this task makes it faster (in my case of the download)
Grigory
@pomadchin
yep

threadpool is allocated in a such a way that it is a single thread pool per executor, so it’s not that expensive and you don’t need to worry about the amount of threadpools

that is not a regular practice in spark, usually parallelism is achieved more traditionally
by having smaller partitions and by having more thin executors

but empirically approach with parallelizing within a single partition works nice for IO ops + allows us to handle exceptions and retries much better
StrongSteve
@StrongSteve
of course but the traditional approach would mean have more executors with more cores to achive the same effect
Grigory
@pomadchin
yep
StrongSteve
@StrongSteve
and most of the cores would be idle (in my case) as well as there is nothing CPU intensive going on but network based IO
(at least in that step of the processing)
thx for the clarifications - going in the right direction ;)
Grigory
@pomadchin
:+1: you’re welcome
Grigory
@pomadchin

Hey guys, some GT news here! We successfully dropped Scala 2.11 support and established Scala 2.13 cross compilation :tada:

GT 2.13 version depends on the Spark 3.2.0 SNAPSHOT so be careful using 2.13 sparky deps. Check out new artifacts in the Eclipse snapshots repo here https://repo.eclipse.org/content/repositories/geotrellis-snapshots/org/locationtech/geotrellis/

We also upgraded the current master up to Spark 3.0.1 (to be compatible with EMR) and upgraded our Hadoop dependency (to match EMR version) up to 3;

These changes allowed us to add GT JDK 11 builds support (with the help of this tiny PR locationtech/geotrellis#3383)

Simeon H.K. Fitch
@metasim
Congratulations @pomadchin and team!!! :clap: :tada:
That's major progress.
James Oliver
@FireByTrial
Hey, just looking to confirm my suspicion but there is no GML support in GT right? from what I'm seeing I would have to read it with something like GeoTools and convert to a JTS geom which then I could create a GT MultiPolygon with right? Just want to make sure I'm not missing an easier path
Grigory
@pomadchin

Hey @FireByTrial, that’s correct. Use GeoTools to parse GML geometries and than convert them into JTS GeoMetries

There is no “GT Geometry”, we use JTS types

There is also geotrellis-geotools package that contains functions to covnert easily from GeoTools simple features to GeoTrellis Features or JTS Geometries
James Oliver
@FireByTrial
:+1: I'll take a look at that, another one I saw was https://github.com/ogc-schemas/ogc-tools-gml-jts as well but the one you mentioned will be my next step to check out
Grigory
@pomadchin

@FireByTrial yea I don’t think you need any 3d part ydeps; the idea is to read file via geotools, geotools would represent it as a SimpleFeature, and you can extract geometry from the geotools simple feature (via https://github.com/geotools/geotools/blob/main/modules/library/opengis/src/main/java/org/opengis/feature/simple/SimpleFeature.java#L273) and this geometry is a JTS geometry

^ all steps above are implemented and have convenient methods in the geotrellis-geotools package

Chamin Nalinda
@0xchamin
hi devs, what is the mechanism in GeoTrellis to store images that are of 200+ MB in size. For example, if I am to use GeoTiFF images where each image is around 200 - 500 MB in size, what is the mechanism that GeoTrellis adopt to store my images. Also, in general perspective, what would be the best way to store and manage GeoTiff images which are around 500 MB. thanks!
Grigory
@pomadchin
hey @0xchamin what do you mean by a ‘storing mechanism’?
GT can ‘store’ TIFFs as GeoTrellis layers (these would be avro encoded, deflate compressed chips) or it can work directly with TIFFs :shrug:
but if these are tiffs you can store them on any distributed FS i.e. HDFS, S3, GCP, etc
sorry the question is a bit unclear to me.
Chamin Nalinda
@0xchamin
hi @pomadchin , thanks for the reply. I was thinking, as my GeoTiff images are around 500 MB (one image), I'd have to split them into small tiles and store as tiles. From your reply I can see that I should be able to store geoTiffs directly on HDFS/ S3 etc. Also, I'll take a closer look at GeoTrellis layers and try to understand the concept there. Many thanks.
Max
@Max-AR

If i have a MultibandTile or a Tile that was constructed with .withNoData(Some(NODATA)) is there any way to check if there is NODATA contained within any cell of the raster? I was thinking something like;

val bandContainsNoData = myTile.band(1).contains(NODATA)

I think isNoDataTile might do what I am after
Max
@Max-AR
FWIW my hacky solution is this for (band <- tile.bands) yield band.toArray().contains(NODATA)
Grigory
@pomadchin
hey @Max-AR isNoDataTile is what you want
Kevin Hinson
@kevinmhinson
hello... I'm using geotrellis.spark.clip.ClipToGrid (geotrellis v 2.3.1 still) and I get a bunch of org.locationtech.jts.geom.TopologyExceptions on Polygon data where I don't own the source... side location conflicts and non-noded linestring intersections... from general googling, it seems like it's often rounding problems w/ high precision xy data... is there some generalized approach to handling this kind of thing, or should I get busy writing data cleaning methods to modify the data?
Kevin Hinson
@kevinmhinson
for now I'm just filtering on _.geom.isValid but I'd prefer to try and repair instead of discard
Grigory
@pomadchin
hey @kevinmhinson do you need smth like buffer(0)?
(I don't remember really, but there is smth like that in JTS, or just literally .buffer(0))
James Hughes
@jnh5y
buffer(0) does do some things which fix up geometries. In general, repairing a bad geometry is not a straightforward thing; there are choices.
Depending on what you need to do, asking in the JTS Gitter may be useful as well. Martin Davis just implemented a GeometryFixer. It does things a little different from PostGIS's "st_makeValid"
Kevin Hinson
@kevinmhinson
I'm embarrassed to say I didn't even know about buffer(0)... that mopped up about 40% of the problems in my test data, so that helps a bunch. Thank you for the help and advice @pomadchin & @jnh5y !! I appreciate it!! I'll check in with JTS Gitter too.
Grigory
@pomadchin
@jnh5y good point, yep totally, JTS contains in addition lot’s of cool stuff :rocket:
biangbiang66
@biangbiang66
Hello,everyone! I need to generate the buffers of a large amount of line features. One thought is to apply the same value to the neighbouring pixels of the line pixels in every tile. So is there any methods in Geotrellis that is helpful. Any advice will be appreciated!