Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 17:36
    schnerd starred locationtech/geowave
  • Jan 30 2019 11:01
    hsg77 commented #1474
  • Jan 30 2019 10:58
    hsg77 commented #1474
  • Jan 30 2019 10:57
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:51
    hsg77 commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:12
    rfecher commented #1474
  • Jan 29 2019 10:44
    hsg77 commented #1474
  • Jan 28 2019 22:47
    sunapi386 starred locationtech/geowave
  • Jan 28 2019 21:12

    rfecher on gh-pages

    Lastest javadoc on successful t… (compare)

  • Jan 28 2019 20:47

    rfecher on master

    fixing coveralls (#1488) (compare)

  • Jan 28 2019 20:47
    rfecher closed #1488
  • Jan 28 2019 20:47
    rfecher opened #1488
  • Jan 28 2019 17:02

    rfecher on master

    Update README.md (compare)

  • Jan 28 2019 16:53

    rfecher on master

    updated readme.md (#1486) (compare)

  • Jan 28 2019 16:53
    rfecher closed #1486
rfecher
@rfecher
although you have a custom adapter that may not follow our RasterDataAdapter ... but generally it treats overlap differently than other adapters
it uses IndexDependentDataAdapter to convert the incoming arbitrarily sized image into tiles that match the grid of the index
Grigory
@pomadchin

Yep! and I have a custom adapter and my question is more ~ how to generate dataId properly? I looked into the IndexDependentDataAdapter to use index to generate partition key manually (smth similar to what is done in the RasterAdapter); is it a correct approach? My idea was to get partition key + sorted key from the index and use it as a dataId; or is it smth bad?

I saw in other adapters you use or featureId or create a string basing on the data unique parameters, is this smth I should aim?

rfecher
@rfecher
and it also implements RowMergingDataAdapter to inject custom merge strategy logic (defaulted to "NoDataMergeStrategy" where it track "no data" in the form of footprint boundaries and reserved no data values and the last one written wins for "data" but doesn't blanket overwrite tiles in the case of no data)
Grigory
@pomadchin
hm at least for now Im definitely following an ez path :D I need smth like FeatureAdapter (I dont need to merge entries and probably wont need it but I need to index them (3-5 dims) and to query by these dims)
rfecher
@rfecher
well, overlapping data IDs for the way the raster case is intentional so that merging happens
Grigory
@pomadchin
Ahhhhh
rfecher
@rfecher
hmm, are you creating the index programmatically?
Grigory
@pomadchin
Yep; smth like
new CustomNameIndex(
      XZHierarchicalIndexFactory.createFullIncrementalTieredStrategy(
        dimensions, // 4 dims 
        Array[Int](
          options.getBias.getSpatialPrecision,
          options.getBias.getSpatialPrecision,
          options.getBias.getTemporalPrecision,
          options.getBias.getSpatialPrecision // just an example of a 4th dim precision
        ),
        SFCType.HILBERT,
        options.getMaxDuplicates
      ),
      indexModel,
      combinedId
    )
rfecher
@rfecher
so to the best of my understanding you can really treat it like vector data rather than what we're doing with tiling the data within the natural gridding of the index and merging overlapping raster tiles...so make the data ID something unique per row
I don't believe you need to care about IndexDependent... or RowMerging...
I don't think using index sort/partition keys as the data ID would be a good idea (they're not guaranteed unique, plus its already in the key so one thing data ID is there for is to absolutely guarantee uniqueness of a key)
with 3-5 dimensions you start to get into extreme unlikeliness for overlapping keys anyways
Grigory
@pomadchin
Thanks @rfecher makes sense; so I will try to derive some unique string basing on the input entry (: thanks!
I also thought to derive it basing on some information in the entry and basing on the index :o
~ get partition key from the index by passing all dims inside + some kinda identifying information from the entry
rfecher
@rfecher
and in answer to another question you had, to just see what keys your index should be generated for a row you can call index.getIndexStrategy().getInsertionIds(<BasicNumericDataSet>)
Grigory
@pomadchin
:+1: nice
rfecher
@rfecher
BasicNumericDataSet just wraps NumericData (which can be a range or single value) per dimension in the same order as the dimensions defined in your index
basically what your NumericDimensionField in the CommonIndexModel does within its getNumericData() method gets passed to the index strategy's getInsertionIds() method which ultimately gets written as the partition and sort keys in the data store
Grigory
@pomadchin
@rfecher hmmm you know it looks like Im getting same insertionIDS even though dims are different:
// for instance I have these dims:
// pseudocode here
val bounds = List(NumericRange [min=-82.0, max=-60.0], NumericRange [min=25.0, max=34.0], NumericRange [min=1.4019264E12, max=1.4019264E12], NumericRange [min=21.0, max=21.0])

val keys = index.getIndexStrategy().getInsertionIds(bounds).getFirstPartitionAndSortKeyPair
//>  keys.getLeft: List(4, 50, 48, 49, 52)
//> keys.getRight: List(122, -55)

// but I get the same result for 
val bounds2 = List(NumericRange [min=-82.0, max=-60.0], NumericRange [min=25.0, max=34.0], NumericRange [min=1.4019264E12, max=1.4019264E12], NumericRange [min=22.0, max=22.0])

val keys = index.getIndexStrategy().getInsertionIds(bounds).getFirstPartitionAndSortKeyPair
//>  keys.getLeft: List(4, 50, 48, 49, 52)
//> keys.getRight: List(122, -55)
is it smth wrong in my index configuration (i.e. 4th dimension set incorrectly)?
rfecher
@rfecher
hmm, whats the min/max on that 4th dimension?
Grigory
@pomadchin
two first are spatial, 3d is temporal and 4th is just a custom BasicDimensionDefinition(minValue, maxValue)
rfecher
@rfecher
ie. how does it normalize the value 21 or 22
Grigory
@pomadchin
I think right now it is 21 / 100
i.e. 0.21 would be a normalized value for the 21
Yep, doublechecked - I limited this dimension definition in the 0 to 100 range
rfecher
@rfecher
hmm, I don't know the math on it, try calling XZOrderSFC.getId(<array of 4 doubles, normalized values in each dimension fo for the example above>)
oh, I mean 8 doubles
pairwise min and max for each dimension (normalized)
Grigory
@pomadchin
Hmmm will try it in an hour; thanks
Grigory
@pomadchin
ah I was quicker than I thought:
XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]

XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]
rfecher
@rfecher
and I really suspect that what you're seeing here is actually the math but it would be best to double-check .... what we've found in our benchmarking is that XZ is great in that it guarantees a single key given extents, it really loses specificity when each dimension is highly irregular (so great for polygons in lat/lon, but when you add to it an insertion time range for example which has no strong relation to lat/lon, ie. a highly oblong hyper-rectangle, it really loses specificity in indexing)
we measured it and put out some of these numbers to give an idea here: http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1027&context=foss4g
Grigory
@pomadchin
aouch
so what would you recommend here?
another curve?
rfecher
@rfecher
yeah, so it looks like the keys are expected ... well, its really hard to say holistically, there's a lot of tradeoffs going on - I think I'd recommend a tiered index with a max duplication set to something you find reasonable given your storage constraints
but the index specificity xzorder may not be a huge problem either if your query constraints are sufficiently constricting in each dimension
remember there are 3 other dimensions that would additionally help constrain query results in addition to that 4th dimension where you see the loss in specificity
Grigory
@pomadchin
Hm, yea; It looks like it works tbh; if I would add a dataId; it will filter everything correct (I think)
So in this case it will loose information about 4th dim and will do a scan through all selected rows?
rfecher
@rfecher
of course it depends on data distribution but given reasonable constraints in all 4 dimensions you should be fine - and understand that we do "fine-grained" intersection in addition to the SFC key space so you're not going to get back false positives all the way to your client, it will be filtered out within geowave
Grigory
@pomadchin
gotcha
that is cool
Okay, I would like to go an honest way now: So you recommend to try TieredSFCIndexFactory?
rfecher
@rfecher
but if you're likely to have extremely loose constraints in the other dimensions and expecting tight constraints in the 4th dimension to filter out all your keys you'd probably want to look at the tiered approach
do you only have ranges/extents in your spatial dimensions on insertion?
ie. no time ranges going in, and no range on that 4th dimension on insertion