Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    James Hughes
    @jnh5y
    @echeipesh thanks!
    Eugene Cheipesh
    @echeipesh
    @jnh5y @elahrvivaz So recalling from the the context for zdivide as I understood it: You have some z-range (probably it it intersects the decomposed ranges of your bbox) you drop a point somewhere in the middle. You have couple of cases: the point can be in the middle of a your bounding box, basically noop; the point can be outside of your bounding box, in which case it can compute binmin, the starting point of the next range up the z-curve, and little max, the ending point of the previous range down the z-curve, allowing you to drop all points in between those two points.
    Emilio
    @elahrvivaz
    @echeipesh thanks - we were sometimes calling zdivide with a point inside the bbox, which is GIGO i think
    Eugene Cheipesh
    @echeipesh
    Really shouldn’t be GIGO, would make the whole thing kind of useless, right ? How could you know if xd is going to be inside query space or not ?
    Emilio
    @elahrvivaz
    yeah, that's the kicker... but i think in these databases they scan through every point, then want to skip ahead once they're outside the bbox
    you can check pretty easily to see if it's in the bbox, but then i'm not sure of an easy way to get one outside the bbox and still within the z-values
    anyway, i think we have a different way of accomplishing the same thing, so we should be good
    Eugene Cheipesh
    @echeipesh
    where you looking to use the zdivide method in hopes it would be more efficient ?
    Emilio
    @elahrvivaz
    not really, i was just using it because it seemed like it easily split a zrange into 2
    being efficient is always nice though :)
    so we were doing something like: zdivide(min.mid(max), min, max)
    which mostly worked, so it was confusing
    but it left out some values
    in the resulting ranges
    actually it was (min.z + max.z) / 2 - mid would always be in the query box i believe
    Eugene Cheipesh
    @echeipesh
    I’ll have to hack on it to double check. Just looking over the paper the case of “xd is in bbox” is mentioned but not expounded upon. I suspect your reasoning is spot on, they’re doing a table scan until they get to a point outside the range, then they need to skip. I recall that I was able to do a decomposition with z-divide. AFAIR there is a case for bigmin and litmax calculation that you can safely check for, but maybe it was me missinterpreting the paper.
    Emilio
    @elahrvivaz
    no worries - i think everything's actually correct, and we're good now. thanks again
    Eugene Cheipesh
    @echeipesh
    :shipit:
    Rob Emanuele
    @lossyrob
    So I’ve been thinking, I’d like to get GeoTrellis on sfcurve after or next release, which should be in a couple months.
    Thinking ahead…I’m kind of regreting the choice to keep at Longs and not using an array of bytes to represent the index
    it seems like the lack of precision contraints is also good
    but it also would allow for the ideas of periodicity and sharding to be introduced
    which is something I’m interested in adding to GeoTrellis, which GeoMesa and GeoWave already have, and I’d like to be able to code our version of it in a way that’s in the common lib instead of just another copy of that type of logic in GeoTrellis
    Rob Emanuele
    @lossyrob
    so I guess I’m proposing two major refactors/features:
    Thoughts?
    rfecher
    @rfecher
    GeoWave made the choice to use byte arrays and allow for arbitrarily high precision curves, and I'm happy with the decision...and regarding partitioning and periodicity, we have a higher level concept of a "numeric index strategy" that contains one or more sfcs, rather than rolling that into the definition of an sfc which I'd prefer, albeit that higher level strategy can still fit in sfcurve
    a couple questions, would sfcurve like to include GeoWave, and probably the more difficult question would sfcurve consider including java?
    Rob Emanuele
    @lossyrob
    My thoughts are that we could keep the core dev to Scala, and have a Java API that made everything callable from Java in a java-idomatic way. Would that fit your idea of including java?
    I think the ideal goal is that all 3 projects eventually depend on sfcurve
    Rob Emanuele
    @lossyrob
    I think calling “periodicity” or “partitioning” an SFC in a numeric index strategy doesn’t make sense, but if there was some way to compose SFC’s with those other indexing strategies in a straighfoward way, that would make sense
    I feel like in GeoMesa those things are option-based when they could be composition-based; GeoWave has the Composite index, which tbh was a somewhat tricky API to navigate, though we figured it out: https://github.com/lossyrob/geowave-geomesa-comparative-analysis/blob/rob/gdelt/empirical-data/geowave/src/main/scala/com/azavea/ingest/geowave/Ingest.scala#L183
    I guess the confusing part was with the Index vs IndexStrategy vs IndexModel
    Rob Emanuele
    @lossyrob
    but I agree that’s the basic model that would be useful - composition of indexes like you have w/ the compound stuff
    Anthony Fox
    @anthonyccri
    @lossyrob a long time ago we used to have a compositional approach to index construction
    however, as GeoMesa started to emphasize being a run-time only dependency, we needed to tunnel index options through the geotools API (admittedly quite stringly-typed)
    there's no reason we couldn't have both in sfcurve though - something a la composition if a user wants to use the programmatic api and maybe a typesafe config based constructor if a user wanted to set index options through a config file
    as far as composed SFCs, i think @cne1x and @jnh5y have lots of ideas
    rfecher
    @rfecher
    part of the reason for including java was to include what we're doing already - we have an sfc interface and the index strategy interface that we'd have to satisfy in sfcurve barring any major refactors
    I totally understand your answer, but we couldn't really port code if we were trying to keep java out of the core
    rfecher
    @rfecher
    oh and regarding Index v Index Strategy v Index Model - quick explanation might help, the only one of those that is in the geowave core index module is the index strategy...perhaps confusing naming, but the "store" module uses the Index Model as a way to provide common readers and writers across the index as basically the only reasonable way to comingle data of multiple types, formats, etc. and query across them all with the ability to still run the filtering on the server - and Index just wraps the index strategy and the index model together...overloaded "index" perhaps, but really the only pertinent component to sfcurve is GeoWave's index strategy, the others introduce commonalities in storage of types that seem completely out of scope of sfcurve
    rfecher
    @rfecher
    so it is the index strategy that can be composed with another index strategy, where an index strategy can be purely a wrapper around an SFC - partitioning or periodicity isn't an SFC but just contribute to an index strategy...anyways I actually think we're saying the exact same thing, maybe in a different way
    James Hughes
    @jnh5y
    hi all, from my point of view, I'd like SFCurve to be a library which makes sense on the JVM (and potentially other languages)
    As we work out the ideas around composition, etc, I think it would be good to have one project for that
    @lossyrob, I think using byte arrays makes some sense. It might help provide more general precision.
    that said, I'd want to understand any performance changes; it might make sense to keep Longs around for some specific uses
    Rob Emanuele
    @lossyrob
    right. would be good to have benchmarks to show if it slows down things to do byte arrays, and if there’s any ways around that
    I think the ideas of composition and the abstraction in geowave are something we should def draw on.
    But we prob would’t take it verbatim, or just import the geowave code into sfcurve…when we make the switch to sfcurve we’re going to have to adapt a bunch of our code, and I can imagine that’s true across all the projects that use it
    rfecher
    @rfecher
    the math outside of java primitives is definitely costly, our attempt to minimize the impact is using an interface with primitive and biginteger/decimal math implementations and then choose the optimal implementation based on required precision (interestingly enough, query decomposition and get ID have different metrics for fitting within java primitives so you may end up using a mix of implementations)