by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 17:36
    schnerd starred locationtech/geowave
  • Jan 30 2019 11:01
    hsg77 commented #1474
  • Jan 30 2019 10:58
    hsg77 commented #1474
  • Jan 30 2019 10:57
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:51
    hsg77 commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:12
    rfecher commented #1474
  • Jan 29 2019 10:44
    hsg77 commented #1474
  • Jan 28 2019 22:47
    sunapi386 starred locationtech/geowave
  • Jan 28 2019 21:12

    rfecher on gh-pages

    Lastest javadoc on successful t… (compare)

  • Jan 28 2019 20:47

    rfecher on master

    fixing coveralls (#1488) (compare)

  • Jan 28 2019 20:47
    rfecher closed #1488
  • Jan 28 2019 20:47
    rfecher opened #1488
  • Jan 28 2019 17:02

    rfecher on master

    Update README.md (compare)

  • Jan 28 2019 16:53

    rfecher on master

    updated readme.md (#1486) (compare)

  • Jan 28 2019 16:53
    rfecher closed #1486
Grigory
@pomadchin
Hmmm will try it in an hour; thanks
Grigory
@pomadchin
ah I was quicker than I thought:
XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 21.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]

XZHierarchicalIndexStrategy::mins: [-82.0, 25.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::maxes: [-60.0, 34.0, 1.3392E10, 22.0]
XZHierarchicalIndexStrategy::xzId: [0, 0, 2, -124, -18, -18, -18, -15]
rfecher
@rfecher
and I really suspect that what you're seeing here is actually the math but it would be best to double-check .... what we've found in our benchmarking is that XZ is great in that it guarantees a single key given extents, it really loses specificity when each dimension is highly irregular (so great for polygons in lat/lon, but when you add to it an insertion time range for example which has no strong relation to lat/lon, ie. a highly oblong hyper-rectangle, it really loses specificity in indexing)
we measured it and put out some of these numbers to give an idea here: http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1027&context=foss4g
Grigory
@pomadchin
aouch
so what would you recommend here?
another curve?
rfecher
@rfecher
yeah, so it looks like the keys are expected ... well, its really hard to say holistically, there's a lot of tradeoffs going on - I think I'd recommend a tiered index with a max duplication set to something you find reasonable given your storage constraints
but the index specificity xzorder may not be a huge problem either if your query constraints are sufficiently constricting in each dimension
remember there are 3 other dimensions that would additionally help constrain query results in addition to that 4th dimension where you see the loss in specificity
Grigory
@pomadchin
Hm, yea; It looks like it works tbh; if I would add a dataId; it will filter everything correct (I think)
So in this case it will loose information about 4th dim and will do a scan through all selected rows?
rfecher
@rfecher
of course it depends on data distribution but given reasonable constraints in all 4 dimensions you should be fine - and understand that we do "fine-grained" intersection in addition to the SFC key space so you're not going to get back false positives all the way to your client, it will be filtered out within geowave
Grigory
@pomadchin
gotcha
that is cool
Okay, I would like to go an honest way now: So you recommend to try TieredSFCIndexFactory?
rfecher
@rfecher
but if you're likely to have extremely loose constraints in the other dimensions and expecting tight constraints in the 4th dimension to filter out all your keys you'd probably want to look at the tiered approach
do you only have ranges/extents in your spatial dimensions on insertion?
ie. no time ranges going in, and no range on that 4th dimension on insertion
Grigory
@pomadchin
eh I have a time dimension but we can bound it (may be :D)
rfecher
@rfecher
to be clear though, I am talking about the insertion value, not query
Grigory
@pomadchin
They key right now looks like this: (extent, timestamp, mydim)
rfecher
@rfecher
what I saw in your example is a single time for the entry, ie. the image was collected at a certain time, as opposed to for example a track you want to index represents a start time and end time
Grigory
@pomadchin
Ah, yes; I have a single time value per entry (at least for now)
rfecher
@rfecher
at least if you want to index the track properly
Grigory
@pomadchin
hmm
rfecher
@rfecher
yeah, so I'd definitely lean towards tiered if duplicating up to 4 times is ok, you have extents in 2 dimensions - generally I think it's going to be faster for this use case, but you could benchmark too if you'd like
Grigory
@pomadchin
How would it compute keys in this case? They would be unique?
(In case of duplicates) or there would be just entries with the same partition key?
rfecher
@rfecher
yeah, same caveat that data ID enforces uniqueness
Grigory
@pomadchin
ha gotcha
okay thank you so much! it looks like I understand how it works better now
rfecher
@rfecher
the reason index strategy getInsertionIds() is plural - one row can generate multiple keys, on query we de-dupe results so you'll only see the one row if the query overlaps multiple of those keys
Grigory
@pomadchin
ha, nice
I think I could see that in the RasterDataAdapter implementation
rfecher
@rfecher
yep, although raster data adapter has the added complication of row merging on overlap so following that completely could be a deep dark rabbit hole if you don't want or care about merging overlapping rows
Grigory
@pomadchin
already tried and came to a conclusion that I don’t need it now :d
ram-98
@ram-98
@rfecher hi
ram-98
@ram-98
Im trying to do dbscan in geowave and here is what i did Ingested shape file using below command and the following tables are created in cassandra -
"adapter_geowave_metadata ,aim_geowave_metadata , index_geowave_metadata ,internal_adapter_geowave_metadata , stats_geowave_metadata ,testindex5 "
  1. geowave ingest localtogw /mnt/ne_50m_admin_0_countries teststore5 testindex5 -f geotools-vector After this when i do " geowave store listtypes teststore5"
    , it gives the following output."Available types: ne_50m_admin_0_countries "
  1. But when i do DBSCAN on top of this using the command
    "geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfs localhost:9870 -jobtracker localhost:8088 -hdfsbase /test_dir teststore5 --query.typeNames ne_50m_admin_0_countries "
    I get the error
    "Error: java.lang.IllegalArgumentException: Can not set [Ljava.lang.String; field org.locationtech.geowave.core.cli.parsed.cli_6816fd68_a6d5_46ed_ae49_9547821da5c3_42.field_37 to java.lang.String"
    can anyone help me how to solve this? Thank you.
rfecher
@rfecher
hi @ram-98 - maybe you can post a more complete stack trace in a gist or something? I think it may help to have more context
Haocheng Wang
@HaochengNn
@rfecher Hi, I'm confused on deploying Geowave on my hbase cluster. Should I download the source and build it on my master node, and deploy the Hbase Plugin to all of my nodes? Is this right?
rfecher
@rfecher
@HaochengNn you can just install it from RPM rather than build from source ... instructions for that are here and you'll want to install the "HBase Components" (you can install it on master only, its going to put the libraries on HDFS for the other nodes to pick up) and you'll also likely want the commandline tools (and perhaps others depending on what you plan to do with it)
Haocheng Wang
@HaochengNn
I find that i can't get any result using "yum --enablerepo=geowave install geowave-1.1.0-SNAPSHOT-apache-*" command. but i can get some when the $VERSION is "1.0.0" and "0.9.3", is it still unavailable to install 1.1.0snapshot from RPM now?
pluresideas
@pluresideas

Hi, I am new to geowave. In one of the geowave examples there are these commands:

  • geowave store add gdelt -t redis --gwNamespace geowave.gdelt --address redis://127.0.0.1:6379
  • geowave index add -t spatial gdelt gdelt-spatial
  • geowave ingest localtogw /mnt/gdelt gdelt gdelt-spatial -f gdelt --gdelt.cql "INTERSECTS(geometry,$GERMANY)"

I do not see there instructions to geowave how it should construct the index. I assume that gdelt and gdelt-spatial are just names. Or are these names recognized by geowave and used to construct the right index?How does geowave know how to construct the index? Can you shed some light on this? Thanks!

pluresideas
@pluresideas
Is it -f gdelt option that specifies the data format and index used?
rfecher
@rfecher
@pluresideas good question ... the commandline tools are extensible and most of the options are discovered at runtime. gdelt is just a named placeholder for the store which got added in the first command. Its essentially the connection info plus perhaps a few more advanced options. gdelt-spatial is also a placeholder for the index whose options are provided by java SPI implementations of DimensionalityTypeProviderSpi. The types available are defined by each implementation's "type name" such as this and the other options are defined by the JCommander annotations on this object. Then the actual index is programmatically built based on the commandline options here. There are also always available options such as num_partitions (typing --help on any of the commands will give you feedback on what you can do, and at times defining the type with -t and then --help will give you additional options). In addition to the "spatial" type used here, we also have a -t spatial_temporal type option and a -t temporal type option that we provide out of the box. Additionally by just dropping in implementations of the SPI interface with the appropriates META-INF/services within any directory on the classpath (we should automatically create a /plugins directory on the classpath with any of our installers) you can define your own index types. Additionally, because the ingest tooling is very flexible with several extension points such as index types, you can print out all available plugins currently on the classpath with the geowave ingest listplugins command.
pluresideas
@pluresideas
Thank you for the write up, it is very helpful!
Brad Hards
@bradh
In the geowave STANAG 4676 stuff (or GPX, or anything that looks like a moving dot, basically), did you ever come across an open source track generator? Something that could output a realistic-looking track?
rfecher
@rfecher
@bradh nothing great in the open source world that I know of for simulating tracks...there are various projects that can do routing, and then its potentially a matter of coming up with a reasonable distribution of start and end points to simulate traffic patterns. However, you can take large sets of publicly available real data, and if you need to generate more, use temporal and/or spatial offsets. The largest set that I know of is OSM GPS which as I recall is a few billion track points. Otherwise, Microsoft Research had published a couple interesting trajectory datasets: T-Drive and GeoLife ... hopefully these can help?
Brad Hards
@bradh
I was hoping for a 4676 generator. The actual need is a bit outside of the geowave area - I'm trying to do a "ground truth" track, then generate the observables for that. If I can make enough time, maybe a mix-n-match style approach so you can take those big trajectory sets, completely synthetic sets (maybe off OSM routing outputs for something a bit more real), maybe real tracks from ADS-B or AIS; and then tie those to various sensors (with error distribution), and then out to different kinds of formats.