Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 17:36
    schnerd starred locationtech/geowave
  • Jan 30 11:01
    hsg77 commented #1474
  • Jan 30 10:58
    hsg77 commented #1474
  • Jan 30 10:57
    hsg77 commented #1474
  • Jan 30 10:53
    hsg77 commented #1474
  • Jan 30 10:53
    hsg77 commented #1474
  • Jan 30 10:51
    hsg77 commented #1474
  • Jan 29 16:30
    JWileczek commented #1474
  • Jan 29 16:30
    JWileczek commented #1474
  • Jan 29 16:12
    rfecher commented #1474
  • Jan 29 10:44
    hsg77 commented #1474
  • Jan 28 22:47
    sunapi386 starred locationtech/geowave
  • Jan 28 21:12

    rfecher on gh-pages

    Lastest javadoc on successful t… (compare)

  • Jan 28 20:47

    rfecher on master

    fixing coveralls (#1488) (compare)

  • Jan 28 20:47
    rfecher closed #1488
  • Jan 28 20:47
    rfecher opened #1488
  • Jan 28 17:02

    rfecher on master

    Update README.md (compare)

  • Jan 28 16:53

    rfecher on master

    updated readme.md (#1486) (compare)

  • Jan 28 16:53
    rfecher closed #1486
rfecher
@rfecher
ie. how does it normalize the value 21 or 22
Grigory
@pomadchin
I think right now it is 21 / 100
i.e. 0.21 would be a normalized value for the 21
Yep, doublechecked - I limited this dimension definition in the 0 to 100 range
rfecher
@rfecher
hmm, I don't know the math on it, try calling XZOrderSFC.getId(<array of 4 doubles, normalized values in each dimension fo for the example above>)
oh, I mean 8 doubles
pairwise min and max for each dimension (normalized)
Grigory
@pomadchin
Hmmm will try it in an hour; thanks
Grigory
@pomadchin
ah I was quicker than I thought:
XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]

XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]
rfecher
@rfecher
and I really suspect that what you're seeing here is actually the math but it would be best to double-check .... what we've found in our benchmarking is that XZ is great in that it guarantees a single key given extents, it really loses specificity when each dimension is highly irregular (so great for polygons in lat/lon, but when you add to it an insertion time range for example which has no strong relation to lat/lon, ie. a highly oblong hyper-rectangle, it really loses specificity in indexing)
we measured it and put out some of these numbers to give an idea here: http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1027&context=foss4g
Grigory
@pomadchin
aouch
so what would you recommend here?
another curve?
rfecher
@rfecher
yeah, so it looks like the keys are expected ... well, its really hard to say holistically, there's a lot of tradeoffs going on - I think I'd recommend a tiered index with a max duplication set to something you find reasonable given your storage constraints
but the index specificity xzorder may not be a huge problem either if your query constraints are sufficiently constricting in each dimension
remember there are 3 other dimensions that would additionally help constrain query results in addition to that 4th dimension where you see the loss in specificity
Grigory
@pomadchin
Hm, yea; It looks like it works tbh; if I would add a dataId; it will filter everything correct (I think)
So in this case it will loose information about 4th dim and will do a scan through all selected rows?
rfecher
@rfecher
of course it depends on data distribution but given reasonable constraints in all 4 dimensions you should be fine - and understand that we do "fine-grained" intersection in addition to the SFC key space so you're not going to get back false positives all the way to your client, it will be filtered out within geowave
Grigory
@pomadchin
gotcha
that is cool
Okay, I would like to go an honest way now: So you recommend to try TieredSFCIndexFactory?
rfecher
@rfecher
but if you're likely to have extremely loose constraints in the other dimensions and expecting tight constraints in the 4th dimension to filter out all your keys you'd probably want to look at the tiered approach
do you only have ranges/extents in your spatial dimensions on insertion?
ie. no time ranges going in, and no range on that 4th dimension on insertion
Grigory
@pomadchin
eh I have a time dimension but we can bound it (may be :D)
rfecher
@rfecher
to be clear though, I am talking about the insertion value, not query
Grigory
@pomadchin
They key right now looks like this: (extent, timestamp, mydim)
rfecher
@rfecher
what I saw in your example is a single time for the entry, ie. the image was collected at a certain time, as opposed to for example a track you want to index represents a start time and end time
Grigory
@pomadchin
Ah, yes; I have a single time value per entry (at least for now)
rfecher
@rfecher
at least if you want to index the track properly
Grigory
@pomadchin
hmm
rfecher
@rfecher
yeah, so I'd definitely lean towards tiered if duplicating up to 4 times is ok, you have extents in 2 dimensions - generally I think it's going to be faster for this use case, but you could benchmark too if you'd like
Grigory
@pomadchin
How would it compute keys in this case? They would be unique?
(In case of duplicates) or there would be just entries with the same partition key?
rfecher
@rfecher
yeah, same caveat that data ID enforces uniqueness
Grigory
@pomadchin
ha gotcha
okay thank you so much! it looks like I understand how it works better now
rfecher
@rfecher
the reason index strategy getInsertionIds() is plural - one row can generate multiple keys, on query we de-dupe results so you'll only see the one row if the query overlaps multiple of those keys
Grigory
@pomadchin
ha, nice
I think I could see that in the RasterDataAdapter implementation
rfecher
@rfecher
yep, although raster data adapter has the added complication of row merging on overlap so following that completely could be a deep dark rabbit hole if you don't want or care about merging overlapping rows
Grigory
@pomadchin
already tried and came to a conclusion that I don’t need it now :d
ram-98
@ram-98
@rfecher hi
ram-98
@ram-98
Im trying to do dbscan in geowave and here is what i did Ingested shape file using below command and the following tables are created in cassandra -
"adapter_geowave_metadata ,aim_geowave_metadata , index_geowave_metadata ,internal_adapter_geowave_metadata , stats_geowave_metadata ,testindex5 "
  1. geowave ingest localtogw /mnt/ne_50m_admin_0_countries teststore5 testindex5 -f geotools-vector After this when i do " geowave store listtypes teststore5"
    , it gives the following output."Available types: ne_50m_admin_0_countries "
  1. But when i do DBSCAN on top of this using the command
    "geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfs localhost:9870 -jobtracker localhost:8088 -hdfsbase /test_dir teststore5 --query.typeNames ne_50m_admin_0_countries "
    I get the error
    "Error: java.lang.IllegalArgumentException: Can not set [Ljava.lang.String; field org.locationtech.geowave.core.cli.parsed.cli_6816fd68_a6d5_46ed_ae49_9547821da5c3_42.field_37 to java.lang.String"
    can anyone help me how to solve this? Thank you.
rfecher
@rfecher
hi @ram-98 - maybe you can post a more complete stack trace in a gist or something? I think it may help to have more context
Haocheng Wang
@HaochengNn
@rfecher Hi, I'm confused on deploying Geowave on my hbase cluster. Should I download the source and build it on my master node, and deploy the Hbase Plugin to all of my nodes? Is this right?
rfecher
@rfecher
@HaochengNn you can just install it from RPM rather than build from source ... instructions for that are here and you'll want to install the "HBase Components" (you can install it on master only, its going to put the libraries on HDFS for the other nodes to pick up) and you'll also likely want the commandline tools (and perhaps others depending on what you plan to do with it)
Haocheng Wang
@HaochengNn
I find that i can't get any result using "yum --enablerepo=geowave install geowave-1.1.0-SNAPSHOT-apache-*" command. but i can get some when the $VERSION is "1.0.0" and "0.9.3", is it still unavailable to install 1.1.0snapshot from RPM now?