Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 17:36
    schnerd starred locationtech/geowave
  • Jan 30 2019 11:01
    hsg77 commented #1474
  • Jan 30 2019 10:58
    hsg77 commented #1474
  • Jan 30 2019 10:57
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:53
    hsg77 commented #1474
  • Jan 30 2019 10:51
    hsg77 commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:30
    JWileczek commented #1474
  • Jan 29 2019 16:12
    rfecher commented #1474
  • Jan 29 2019 10:44
    hsg77 commented #1474
  • Jan 28 2019 22:47
    sunapi386 starred locationtech/geowave
  • Jan 28 2019 21:12

    rfecher on gh-pages

    Lastest javadoc on successful t… (compare)

  • Jan 28 2019 20:47

    rfecher on master

    fixing coveralls (#1488) (compare)

  • Jan 28 2019 20:47
    rfecher closed #1488
  • Jan 28 2019 20:47
    rfecher opened #1488
  • Jan 28 2019 17:02

    rfecher on master

    Update README.md (compare)

  • Jan 28 2019 16:53

    rfecher on master

    updated readme.md (#1486) (compare)

  • Jan 28 2019 16:53
    rfecher closed #1486
Brad Hards
@bradh
In the geowave STANAG 4676 stuff (or GPX, or anything that looks like a moving dot, basically), did you ever come across an open source track generator? Something that could output a realistic-looking track?
rfecher
@rfecher
@bradh nothing great in the open source world that I know of for simulating tracks...there are various projects that can do routing, and then its potentially a matter of coming up with a reasonable distribution of start and end points to simulate traffic patterns. However, you can take large sets of publicly available real data, and if you need to generate more, use temporal and/or spatial offsets. The largest set that I know of is OSM GPS which as I recall is a few billion track points. Otherwise, Microsoft Research had published a couple interesting trajectory datasets: T-Drive and GeoLife ... hopefully these can help?
Brad Hards
@bradh
I was hoping for a 4676 generator. The actual need is a bit outside of the geowave area - I'm trying to do a "ground truth" track, then generate the observables for that. If I can make enough time, maybe a mix-n-match style approach so you can take those big trajectory sets, completely synthetic sets (maybe off OSM routing outputs for something a bit more real), maybe real tracks from ADS-B or AIS; and then tie those to various sensors (with error distribution), and then out to different kinds of formats.
Brad Hards
@bradh
BTW: If you want AIS data sets, https://www.operations.amsa.gov.au/Spatial/DataServices/DigitalData will give you AIS tracks from around Australia. One month at a time, thinned to 60 minutes (or 15 minutes for subset areas), years of data over the same coordinates. A typical month is 1.5m points. CC-BY-NC.
Grigory
@pomadchin
hey guys! In GeoWave, would it always be a full table scan in case I query by not all the dimensions of the index? (for instance I have a table indexed spatially temporal, but I would like to perform a spatial query)
rfecher
@rfecher
@pomadchin it should be much better than a full table scan
rfecher
@rfecher
that type of use case is basically the moral of the story in that foss4g academic paper ... the 5 and 6 dimensional indexing as a whole performs much better than lower dimensional indexing, even though only 3 of those dimensions are well-constrained on any given query, it gives you the flexibility to query by combinations of those dimensions without resorting to a full table scan (something like 7+ hours on that dataset) although its not going to be quite as performant as having exactly the dimensions indexed that you are querying (trading off flexibility to support a variety of types of queries without needing to duplicate data using multiple indices)
Grigory
@pomadchin

@rfecher Hm… I profiled Cassandra queries (basically just by enabling Cassandra tracing)
And in all cases I had 3 dim index (geometry, time) and my query had only geometry it looks like it was just performing the select * from indexTable; query, am I reading it a bit incorrect?

In cases when my query contained all the index dimensions it performed a set of range queries (smth like 17k for ~400 entries // but these numbers can be smth different from my head):

 SELECT * FROM indexTable
    WHERE partition=:partition_val
    AND adapter_id IN :adapter_id_val
    AND sort>=:sort_min AND sort<:sort_max;

Mb the differende is in the Index type? I used XZHierarchicalIndexFactory for the tests

Grigory
@pomadchin

Ah just to clarify my words a little bit (in terms of language):

  1. table was indexed with a spatialtemporal index

  2. All ExplicitSpatialQueries were

    select * from indexTable;
  3. All ExplicitSpatialTemporalQuery were

    SELECT * FROM indexTable
     WHERE partition=:partition_val
     AND adapter_id IN :adapter_id_val
     AND sort>=:sort_min AND sort<:sort_max;
so the question is: it is an expected behavior / can I do smth to improve performance? Or I was just in some local case and the full table scan was triggered becuase of the index type / data specifics
rfecher
@rfecher
no, you're right... I just followed along to verify - the code falls into this block which converts to full table scan ... sorry about that, for our benchmarks we set the constraints in unconstrained dimensions to be the extent of the data from stats and I think thats what we do from geoserver/geotools queries, but unfortunately not internally ... I know at some point we were thinking the geowave datastore API is more explicit while geotools/geoserver could infer things to make reasonable choices, but I think that philosophy isn't too applicable here anyways, seems it should always try to avoid full table scans if it can ... you may want to add this as an issue if you don't mind? I think its worthwhile to backlog. In the interim is it easy enough to set the bounds of any unconstrained dimensions to be the extent of the data? We store all extents of numeric/time data in the stats so I think something like this will get you your full time range: datastore.aggregateStatistics(VectorStatisticsQueryBuilder.newBuilder().factory().timeRange().build());
Grigory
@pomadchin

gotcha! thanks @rfecher will add it ASAP; and thanks for quick answers as usual ;)

): eh Im using my own adapters and it is not a vector data, it contains my random binary format... but I'll double check that API

rfecher
@rfecher
if you need to maintain a statistic on your own adapter you can implement StatisticsProvider, I guess this example from the unit tests is as good as any, the stats types have a query builder that you can use by invoking newBuilder() for the datastore statistics queries or aggregations
Grigory
@pomadchin
Wow, thanks! this may work
Grigory
@pomadchin
@rfecher is it possible to update statistics on write or stats are immutable? for instance on each new write we extend the extent / time
rfecher
@rfecher
each statistic is required to implement IngestCallback so its automatically updated on write (its the primary purpose of statistics really) ... a statistic can also implement DeleteCallback if it can be inverted properly, some like extents would have to be recomputed, but others like counts are easy to just decrement
all stats are merged together so there's no concern about maintaining the full datasets statistic within any one ingest client
Grigory
@pomadchin
that is very cool; so every time to use statistics I need to perform an extra query via datastore.aggregateStatistics to collect information about all the actual dim ranges?
rfecher
@rfecher
hmm, well, to some extent yes, but I'd consider ways to work around that in certain use cases - for example if you have a lot of queries, but once you have a large corpus its unlikely extents dramatically change I'd be trying to do some form of caching, maybe even extend what is coming from stats a little just to be confident it covers the full data set
for example, for time it seems like it can be special cased a bit if you know you're not getting data in the future you could just use the current time as the end of the extent
and if the dimension is bounded already like lat/lon, you don't really need .to go the route of stats, you could also just use the bounds on the dimension
Grigory
@pomadchin
makes a lot of sense; thanks!
such a flexible API, this is pretty sweet :D
hipotato
@hipotato
hello everyone , I'm a newer of geowave, Recently ,I tried to run example code down load from github (geowave 1.0.0), but an error appeared : can not found class "org.locationtech.geowave.core.ingest.avro.AvroWholeFile",so who can tell me what can i do?
Brad Hards
@bradh
@hipotato How did you install?
Exactly which versions, and what example code are you referring to?
rfecher
@rfecher
Did you run mvn install? More in depth instructions are here
hipotato
@hipotato
problem solved after I run mvn install, thanks very much @rfecher @bradh
hipotato
@hipotato
@rfecher hi rfecher , i want run cli commond in IDE whith the code downloaded from github, when run index add -t spatial raster2 raster_local, got an error Error:
Expected a command, got index, seems not support this command ,so where can i configure
Brad Hards
@bradh
@hipotato you need to be more descriptive. Please remember we can't see your screen, so you need to provide background that allows us to understand what you have done before this, what your environment looks like (e.g. the IDE - does the basic geowave command work, have you adjusted PATH or similar) and what you expected to happen instead.
hipotato
@hipotato
This message was deleted
I have ingested raster (GeoTiff) successfully on the standalone installation on linux. Now I want debug the source code of gw in IntelliJ IDEA on Windows7, to go through add store ,add index ,ingest raster function. The main class is org.locationtech.geowave.core.cli.GeoWaveMain,the program arguments is store add raster --gwNamespace geowave.raster -t hbase --zookeeper node24,node25,node26:2181, when run ,the console output is
 Usage: geowave [options]
  Options:
    -cf, --config-file
       Override configuration file (default is
       <home>/.geowave/config.properties)
    --debug
       Verbose output
    --version
       Output Geowave build version information

  Commands:
    config
      Commands that affect local configuration only

    explain
      See what arguments are missing and what values will be used for GeoWave commands

    help
      Get descriptions of arguments for any GeoWave command

    util, utility
      GeoWave utility commands

Error: Expected a command, got store
hipotato
@hipotato
my question is : why comman store is not valid, if I want use commans like 'store/index/ingest', what should i do
Johnathan Garrett
@jdgarrett
@hipotato It looks like only the core CLI is on your classpath. GeoWave builds up the CLI with each project adding commands here and there. In order to get all of the commands, execute the main class from a classpath that has the artifacts for all of the GeoWave projects. One easy way to do it is to debug from the geowave-test project, which includes most of the other GeoWave projects as dependencies.
hipotato
@hipotato
@jdgarrett Got it , Thanks
Davis Silverman
@sinistersnare

Is secondary indexing supposed to be as easy as

options.setSecondaryIndexing(true);
...
dataStore.addType(adapter, primaryIdx, secIdx);

or is there some other way to say that secIdx is the secondary index of the first? Currently I am doing the above and hitting a strange NPE in the GW codebase

rfecher
@rfecher
the data store option for secondary indexing writes out a data ID index where all the actual values are stored with the key being the data ID, then the other indices (in your case primaryIdx and secIdx store just the data IDs as the values, and if necessary will refer back to the data ID index for the full data). This all should happen transparently to the API (ie. the data ID index does not need to be referenced anywhere). Does that make more sense given what you're experiencing?
Davis Silverman
@sinistersnare
ok good, so thats what i was hoping it was doing. I guess my NPE is unrelated then. I havent figured out a good way to explain it, so ill hold off on that
Davis Silverman
@sinistersnare
so it looks like the underlying DataStore isnt saving the IndexStrategy and IndexModel for secIdx when I do .addType(). So when the writer does index.getIndexStrategy().getPredefinedSplits(), .getPredefinedSplits NPEs.
Davis Silverman
@sinistersnare
Okay, so if I disable the underlying HBase tables before running my ingest, spark creates >=2 partitions. The first partition correctly creates the indices and the writer has the correct indices in its instance vars. But the second partition will not create the writer correctly, making the indexStrategy and IndexModel null for the writers' secondary index.
Davis Silverman
@sinistersnare
Scratch that, when I swap secIdx and primaryIdx its still returning a null indexStrategy and indexModel for secIdx, even with it being in the first position. I guess it has something to do with my CustomNameIndex, because primaryIdx is an Index provided via geowave APIs. Any idea why a CustomNameIndex would not be retrieved correctly from Geowave?
Davis Silverman
@sinistersnare

this is how Im creating the Index:

default Index makeFuseIdIndex(AttributeDescriptor descriptor) {
    NumericIndexStrategy indexStrategy = new MyIndexStrategy();
    FeatureAttributeDimensionField[] dimms = {new FeatureAttributeDimensionField(descriptor)};
    CommonIndexModel indexModel = new BasicIndexModel(dimms) {
        @Override
        public boolean useInSecondaryIndex() {
            // we want this to be a secondary index. TODO: Is this necessary?
            return true;
        }
    };
    return new CustomNameIndex(indexStrategy, indexModel, this.indexName() + "_ID");
}

Seems fairly benign

Davis Silverman
@sinistersnare
nvm i didnt have the IndexStrategy in the PersistableRegistrySPI, which doesnt explain why the BasicIndexModel didnt get persisted, but that error doesnt occur anymore.
rfecher
@rfecher
sorry, didn't see this until now...hmm, I'd guess it really bombed out reading the CustomNameIndex because the index strategy wasn't persisted properly which could have lead to that basicindexmodel NPE you experienced (and by the way you don't need to override that method for useInSecondaryIndex and if you do you'd want it to be part of a index model class that is registered as persistable) - the meaning of that is if true the dimensions/fields defined in your basicindexmodel will be serialized within each secondary index along with the data ID so that queries can be fully resolved without referring to the data ID index to bring back all the data (but then if downstream processing needs all the data, it will eventually need to hit the data ID index). For example, a spatial index would put the geometry in the basicindexmodel by default and a spatial query would be able to fully run geometry intersection without referring back to the data ID index, but of course if you wanted to get simplefeatures with all the data the geometries that pass the filter would eventually need to get data from the data ID index (if on the other hand you were just doing some form of aggregation that didn't need all the fields, like a count or a heatmap you wouldn't need to go to the data ID index) ... but its at the cost of course of storing the basicindexmodel data in each secondary index, so I think generally speaking unless you know what you're doing with that useInSecondaryIndex method and you have a real use case that supports it pretty well, I'd suggest just leaving that as the default
hipotato
@hipotato
Hello everyone, I just want to ingest a geotif file into gw by the rest api of v0/ingest/localToGW, the type of param "indexList" is string, and the Code annotation is " Array of
Strings". When I pass the param as '[\"abc\"]' or '{\"abc\"}' or other any string , the service aways throw an exception like "java.lang.String cannot be cast to org.json.JSONArray", who can tell me what style string should i pass.
Johnathan Garrett
@jdgarrett
I believe the parameter should be a comma-delimited list such as 'index1,index2'
So in your case just using 'abc' should be sufficient
Johnathan Garrett
@jdgarrett
Hello everyone, I have cut and tagged the GeoWave 1.1.0 release on GitHub. GeoWave 1.1.0 includes a complete overhaul of the user and developer documentation in order to make GeoWave easier to understand and use. In addition it includes the following improvements:
  • New GeoWave GitHub pages site for improved user experience
  • New GeoWave vector query language to simplify queries and aggregations through the CLI
  • Improved custom index capabilities through the programmatic API
  • Additional standalone data stores that can be used for testing via the CLI
  • New quality of life commands to list and describe data stores and types
  • Various bug fixes and improvements
    https://locationtech.github.io/geowave/
Denis Rykov
@drnextgis
Hello. I installed GeoWave on Linux according to Installation Guide. But when I run geowave help command I get these warnings:
$ geowave help
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by javassist.ClassPool (file:/home/denis/geowave/lib/core/javassist-3.20.0-GA.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of javassist.ClassPool
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Denis Rykov
@drnextgis
seems like it is something related to Java 11, with Java 8 I don't see it anymore