Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 17:36
    schnerd starred locationtech/geowave
  • Jan 30 2019 11:01
    hsg77 commented #1474
  • Jan 30 2019 10:58
    hsg77 commented #1474
  • Jan 30 2019 10:57
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:51
    hsg77 commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:12
    rfecher commented #1474
  • Jan 29 2019 10:44
    hsg77 commented #1474
  • Jan 28 2019 22:47
    sunapi386 starred locationtech/geowave
  • Jan 28 2019 21:12

    rfecher on gh-pages

    Lastest javadoc on successful t… (compare)

  • Jan 28 2019 20:47

    rfecher on master

    fixing coveralls (#1488) (compare)

  • Jan 28 2019 20:47
    rfecher closed #1488
  • Jan 28 2019 20:47
    rfecher opened #1488
  • Jan 28 2019 17:02

    rfecher on master

    Update README.md (compare)

  • Jan 28 2019 16:53

    rfecher on master

    updated readme.md (#1486) (compare)

  • Jan 28 2019 16:53
    rfecher closed #1486
Grigory
@pomadchin
hmm
rfecher
@rfecher
yeah, so I'd definitely lean towards tiered if duplicating up to 4 times is ok, you have extents in 2 dimensions - generally I think it's going to be faster for this use case, but you could benchmark too if you'd like
Grigory
@pomadchin
How would it compute keys in this case? They would be unique?
(In case of duplicates) or there would be just entries with the same partition key?
rfecher
@rfecher
yeah, same caveat that data ID enforces uniqueness
Grigory
@pomadchin
ha gotcha
okay thank you so much! it looks like I understand how it works better now
rfecher
@rfecher
the reason index strategy getInsertionIds() is plural - one row can generate multiple keys, on query we de-dupe results so you'll only see the one row if the query overlaps multiple of those keys
Grigory
@pomadchin
ha, nice
I think I could see that in the RasterDataAdapter implementation
rfecher
@rfecher
yep, although raster data adapter has the added complication of row merging on overlap so following that completely could be a deep dark rabbit hole if you don't want or care about merging overlapping rows
Grigory
@pomadchin
already tried and came to a conclusion that I don’t need it now :d
ram-98
@ram-98
@rfecher hi
ram-98
@ram-98
Im trying to do dbscan in geowave and here is what i did Ingested shape file using below command and the following tables are created in cassandra -
"adapter_geowave_metadata ,aim_geowave_metadata , index_geowave_metadata ,internal_adapter_geowave_metadata , stats_geowave_metadata ,testindex5 "
  1. geowave ingest localtogw /mnt/ne_50m_admin_0_countries teststore5 testindex5 -f geotools-vector After this when i do " geowave store listtypes teststore5"
    , it gives the following output."Available types: ne_50m_admin_0_countries "
  1. But when i do DBSCAN on top of this using the command
    "geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfs localhost:9870 -jobtracker localhost:8088 -hdfsbase /test_dir teststore5 --query.typeNames ne_50m_admin_0_countries "
    I get the error
    "Error: java.lang.IllegalArgumentException: Can not set [Ljava.lang.String; field org.locationtech.geowave.core.cli.parsed.cli_6816fd68_a6d5_46ed_ae49_9547821da5c3_42.field_37 to java.lang.String"
    can anyone help me how to solve this? Thank you.
rfecher
@rfecher
hi @ram-98 - maybe you can post a more complete stack trace in a gist or something? I think it may help to have more context
Haocheng Wang
@HaochengNn
@rfecher Hi, I'm confused on deploying Geowave on my hbase cluster. Should I download the source and build it on my master node, and deploy the Hbase Plugin to all of my nodes? Is this right?
rfecher
@rfecher
@HaochengNn you can just install it from RPM rather than build from source ... instructions for that are here and you'll want to install the "HBase Components" (you can install it on master only, its going to put the libraries on HDFS for the other nodes to pick up) and you'll also likely want the commandline tools (and perhaps others depending on what you plan to do with it)
Haocheng Wang
@HaochengNn
I find that i can't get any result using "yum --enablerepo=geowave install geowave-1.1.0-SNAPSHOT-apache-*" command. but i can get some when the $VERSION is "1.0.0" and "0.9.3", is it still unavailable to install 1.1.0snapshot from RPM now?
pluresideas
@pluresideas

Hi, I am new to geowave. In one of the geowave examples there are these commands:

  • geowave store add gdelt -t redis --gwNamespace geowave.gdelt --address redis://127.0.0.1:6379
  • geowave index add -t spatial gdelt gdelt-spatial
  • geowave ingest localtogw /mnt/gdelt gdelt gdelt-spatial -f gdelt --gdelt.cql "INTERSECTS(geometry,$GERMANY)"

I do not see there instructions to geowave how it should construct the index. I assume that gdelt and gdelt-spatial are just names. Or are these names recognized by geowave and used to construct the right index?How does geowave know how to construct the index? Can you shed some light on this? Thanks!

pluresideas
@pluresideas
Is it -f gdelt option that specifies the data format and index used?
rfecher
@rfecher
@pluresideas good question ... the commandline tools are extensible and most of the options are discovered at runtime. gdelt is just a named placeholder for the store which got added in the first command. Its essentially the connection info plus perhaps a few more advanced options. gdelt-spatial is also a placeholder for the index whose options are provided by java SPI implementations of DimensionalityTypeProviderSpi. The types available are defined by each implementation's "type name" such as this and the other options are defined by the JCommander annotations on this object. Then the actual index is programmatically built based on the commandline options here. There are also always available options such as num_partitions (typing --help on any of the commands will give you feedback on what you can do, and at times defining the type with -t and then --help will give you additional options). In addition to the "spatial" type used here, we also have a -t spatial_temporal type option and a -t temporal type option that we provide out of the box. Additionally by just dropping in implementations of the SPI interface with the appropriates META-INF/services within any directory on the classpath (we should automatically create a /plugins directory on the classpath with any of our installers) you can define your own index types. Additionally, because the ingest tooling is very flexible with several extension points such as index types, you can print out all available plugins currently on the classpath with the geowave ingest listplugins command.
pluresideas
@pluresideas
Thank you for the write up, it is very helpful!
Brad Hards
@bradh
In the geowave STANAG 4676 stuff (or GPX, or anything that looks like a moving dot, basically), did you ever come across an open source track generator? Something that could output a realistic-looking track?
rfecher
@rfecher
@bradh nothing great in the open source world that I know of for simulating tracks...there are various projects that can do routing, and then its potentially a matter of coming up with a reasonable distribution of start and end points to simulate traffic patterns. However, you can take large sets of publicly available real data, and if you need to generate more, use temporal and/or spatial offsets. The largest set that I know of is OSM GPS which as I recall is a few billion track points. Otherwise, Microsoft Research had published a couple interesting trajectory datasets: T-Drive and GeoLife ... hopefully these can help?
Brad Hards
@bradh
I was hoping for a 4676 generator. The actual need is a bit outside of the geowave area - I'm trying to do a "ground truth" track, then generate the observables for that. If I can make enough time, maybe a mix-n-match style approach so you can take those big trajectory sets, completely synthetic sets (maybe off OSM routing outputs for something a bit more real), maybe real tracks from ADS-B or AIS; and then tie those to various sensors (with error distribution), and then out to different kinds of formats.
Brad Hards
@bradh
BTW: If you want AIS data sets, https://www.operations.amsa.gov.au/Spatial/DataServices/DigitalData will give you AIS tracks from around Australia. One month at a time, thinned to 60 minutes (or 15 minutes for subset areas), years of data over the same coordinates. A typical month is 1.5m points. CC-BY-NC.
Grigory
@pomadchin
hey guys! In GeoWave, would it always be a full table scan in case I query by not all the dimensions of the index? (for instance I have a table indexed spatially temporal, but I would like to perform a spatial query)
rfecher
@rfecher
@pomadchin it should be much better than a full table scan
rfecher
@rfecher
that type of use case is basically the moral of the story in that foss4g academic paper ... the 5 and 6 dimensional indexing as a whole performs much better than lower dimensional indexing, even though only 3 of those dimensions are well-constrained on any given query, it gives you the flexibility to query by combinations of those dimensions without resorting to a full table scan (something like 7+ hours on that dataset) although its not going to be quite as performant as having exactly the dimensions indexed that you are querying (trading off flexibility to support a variety of types of queries without needing to duplicate data using multiple indices)
Grigory
@pomadchin

@rfecher Hm… I profiled Cassandra queries (basically just by enabling Cassandra tracing)
And in all cases I had 3 dim index (geometry, time) and my query had only geometry it looks like it was just performing the select * from indexTable; query, am I reading it a bit incorrect?

In cases when my query contained all the index dimensions it performed a set of range queries (smth like 17k for ~400 entries // but these numbers can be smth different from my head):

 SELECT * FROM indexTable
    WHERE partition=:partition_val
    AND adapter_id IN :adapter_id_val
    AND sort>=:sort_min AND sort<:sort_max;

Mb the differende is in the Index type? I used XZHierarchicalIndexFactory for the tests

Grigory
@pomadchin

Ah just to clarify my words a little bit (in terms of language):

  1. table was indexed with a spatialtemporal index

  2. All ExplicitSpatialQueries were

    select * from indexTable;
  3. All ExplicitSpatialTemporalQuery were

    SELECT * FROM indexTable
     WHERE partition=:partition_val
     AND adapter_id IN :adapter_id_val
     AND sort>=:sort_min AND sort<:sort_max;
so the question is: it is an expected behavior / can I do smth to improve performance? Or I was just in some local case and the full table scan was triggered becuase of the index type / data specifics
rfecher
@rfecher
no, you're right... I just followed along to verify - the code falls into this block which converts to full table scan ... sorry about that, for our benchmarks we set the constraints in unconstrained dimensions to be the extent of the data from stats and I think thats what we do from geoserver/geotools queries, but unfortunately not internally ... I know at some point we were thinking the geowave datastore API is more explicit while geotools/geoserver could infer things to make reasonable choices, but I think that philosophy isn't too applicable here anyways, seems it should always try to avoid full table scans if it can ... you may want to add this as an issue if you don't mind? I think its worthwhile to backlog. In the interim is it easy enough to set the bounds of any unconstrained dimensions to be the extent of the data? We store all extents of numeric/time data in the stats so I think something like this will get you your full time range: datastore.aggregateStatistics(VectorStatisticsQueryBuilder.newBuilder().factory().timeRange().build());
Grigory
@pomadchin

gotcha! thanks @rfecher will add it ASAP; and thanks for quick answers as usual ;)

): eh Im using my own adapters and it is not a vector data, it contains my random binary format... but I'll double check that API

rfecher
@rfecher
if you need to maintain a statistic on your own adapter you can implement StatisticsProvider, I guess this example from the unit tests is as good as any, the stats types have a query builder that you can use by invoking newBuilder() for the datastore statistics queries or aggregations
Grigory
@pomadchin
Wow, thanks! this may work
Grigory
@pomadchin
@rfecher is it possible to update statistics on write or stats are immutable? for instance on each new write we extend the extent / time
rfecher
@rfecher
each statistic is required to implement IngestCallback so its automatically updated on write (its the primary purpose of statistics really) ... a statistic can also implement DeleteCallback if it can be inverted properly, some like extents would have to be recomputed, but others like counts are easy to just decrement
all stats are merged together so there's no concern about maintaining the full datasets statistic within any one ingest client
Grigory
@pomadchin
that is very cool; so every time to use statistics I need to perform an extra query via datastore.aggregateStatistics to collect information about all the actual dim ranges?
rfecher
@rfecher
hmm, well, to some extent yes, but I'd consider ways to work around that in certain use cases - for example if you have a lot of queries, but once you have a large corpus its unlikely extents dramatically change I'd be trying to do some form of caching, maybe even extend what is coming from stats a little just to be confident it covers the full data set
for example, for time it seems like it can be special cased a bit if you know you're not getting data in the future you could just use the current time as the end of the extent
and if the dimension is bounded already like lat/lon, you don't really need .to go the route of stats, you could also just use the bounds on the dimension
Grigory
@pomadchin
makes a lot of sense; thanks!
such a flexible API, this is pretty sweet :D
hipotato
@hipotato
hello everyone , I'm a newer of geowave, Recently ,I tried to run example code down load from github (geowave 1.0.0), but an error appeared : can not found class "org.locationtech.geowave.core.ingest.avro.AvroWholeFile",so who can tell me what can i do?
Brad Hards
@bradh
@hipotato How did you install?
Exactly which versions, and what example code are you referring to?
rfecher
@rfecher
Did you run mvn install? More in depth instructions are here
hipotato
@hipotato
problem solved after I run mvn install, thanks very much @rfecher @bradh