Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Kolya Yanchiy
@gokareless_twitter
Hi! Is there by any chance newer versions of docker images for https://github.com/geodocker/geodocker-geomesa/tree/master/geodocker-accumulo-geomesa
5 replies
zhang.yun
@Zhang-Yun
Hi,All,
I am new to GeoMesa.
Currently, I am working on a GIS project which requires store and process 3d vector geometry(X,Y,Z).
I am wondering whether GeoMesa can support store and process 3d vector in full functionality, since it seems it compiles with OGC Simple
Feature Spec in full (2d vector geometry)
Thanks
James Hughes
@jnh5y
Regarding support for 3D vector geometries, there is some support. GeoMesa will store geometries with Z (and maybe M) dimensions.
The largest challenge is that GeoMesa uses the JTS and GeoTools libraries for geometry and geography processing. So some operations may depend on how those libraries support things.
Tim Spijkerman
@timspijkerman_twitter
Has anyone here ever been successful in running a Geomesa (production) environment on AWS EMR with S3? I am trying to accomplish that but, although I succeed in getting parts of it running for a while, it always fails at some point. I am wondering if maybe Geomesa on EMR/S3 is just not such a good idea for a production environment?
James Hughes
@jnh5y
@timspijkerman_twitter Ayup. We use it for a number of customers. PM me if you are interested in enterprise doing exactly that.
Generally, there are a number of issues to work out so that HBase on EMR+S3 is suitable and that information changes depending on the EMR version. I can put you in touch with our team that does handles all those details.
gerard300
@gerard300
Hi I'm noob with geomesa I'm trying to deploy my spark job y HDP cluster but there's something wrong because of the next error:
Ups! The error:
Py4JJavaError: An error occurred while calling o144.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, pd-w2####, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.geomesa.index.iterators.IteratorCache$
    at org.locationtech.geomesa.index.api.QueryPlan$IndexResultsToFeatures.init(QueryPlan.scala:260)
    at org.locationtech.geomesa.index.api.package$SerializableState$.deserialize(package.scala:390)
    at org.locationtech.geomesa.index.api.QueryPlan$ResultsToFeatures$.deserialize(QueryPlan.scala:142)
    at org.locationtech.geomesa.jobs.GeoMesaConfigurator$.getResultsToFeatures(GeoMesaConfigurator.scala:68)
    at org.locationtech.geomesa.jobs.mapreduce.GeoMesaAccumuloInputFormat.createRecordReader(GeoMesaAccumuloInputFormat.scala:101)
2 replies
Haocheng Wang
@haochengw
When we use ID-index by cql "IN (1,5,7,9,6,4)", will GeoMesa automatically join some id together to do a range query, like (1),(4,5,6,7),(9)? or just do the queries independently?
2 replies
JB-data
@JB-data
I was wondering about the geomesa indexes...
When I use CQL I see in the query plan I use z2 or z3 depending on if there is a timestamp involved.
How about the v4 index?
This containst he id.
Is that used during a CQL query?
Docs mention that this is used when querying by id (which I dont see myself doing) or for "certain" attributes.
I also see this ones does not follow the number of splits defined in the sft.
James Hughes
@jnh5y
splitting an index on IDs is tough.
Everyone picks something different for an id....;)
If you are not going to use the id index, you can turn it off in the SFT configuration
the shard is computed as modulo of something (the details are slipping my mind)
Basically, for non-id indices, we assume a uniform distribution and that lets GeoMesa pre-split tables.
Emilio
@elahrvivaz
you can define the splits for the id index - by default it will assume hex chars and create 4 splits
i don't see the reference you mention about 'certain attributes' but i believe that would only apply if you're using accumulo join indices: https://www.geomesa.org/documentation/stable/user/accumulo/index_config.html#join-indices
4 replies
James Hughes
@jnh5y
Ah! I had forgotten about the table splitter configurations
JB-data
@JB-data
thanks, guys!
It is not clear to me what my id is... but it is there taking up space :-)
Each row of my data that I write to geomesa contains a geometry (point), and then some number like a timestamp, and 4-5 other numbers that have a meaning . None of them are called id.
I define in the sft the number of splits, and in the z2 and z3 index I see that number is used but not for the id
James Hughes
@jnh5y
GeoMesa uses SimpleFeatures as the data model for the stored records. All SimpleFeatures have a feature ID
Emilio
@elahrvivaz
if you don't specify the feature ID, one will be generated
by splits you mean geomesa.z.shards?
JB-data
@JB-data
I see in the sft file I use that it is the idfield =md5(string2bytes($0)) .
splits I mean geomesa.z.splits as defined in the sft's user-data section.
I am just wondering what would happen if I would delete all the data for the id.
If I show data in geoserver to generate my maps , possibly it will still work?
I can try.
If I do my processing writing this id takes some quite some time and not clear to me if I am using.
Emilio
@elahrvivaz
you should be able to use the updateSchema method to remove it
or, you can just edit the metadata table row and remove the reference to it there
it's stored in the feature type under geomesa.indices
in the user data
the id index doesn't use shards because there's no reason to parallelize lookups
but you can pre-split your table using the table splitter options mentioned above
JB-data
@JB-data
ok, will try !
[
Just to give the background:
I am incrementally loading data to the same table in batches, I see sometimes due to the cluster this load fails (cluster/memory stuff).
Then my table has partial data.
So I prefer to load to a staging table, and then use hbase.copytable command to copy into my real table (uses mapreduce and less likely to have memory issues).
This I do for all tables (z2, z3, id,etc.)
Since my id table was not having as many splits, copying the id table is a lot slower than the others (z2, z3 for which I increased the splits).
]
Emilio
@elahrvivaz
you can also manually split the table in the hbase shell
be aware increasing the splits is not the same as increasing the shards, those are two separate things with different ramifications
each shard requires a separate query whenever you do a scan, while a split does not
generally it's better to use the table splitter options to pre-split your table, than to increase shards
JB-data
@JB-data
OK , I should make sure to understand better..
Thanks for pointing that out.
For benefiting as much as posisble from the hbase.coprocessor.threads feature in geoserver to make my reads faster, which one matters?
Splits or shards?
Emilio
@elahrvivaz
each shard will create a split, so increasing shards will increase splits
the coprocessor threading model is complicated and i don't recall the exact details, but i think it's based solely on the number of splits, since regions can be scanned in parallel
having more shards will result in more scan ranges though, so that might affect things
although you can also configure the target scan ranges separately
the default behavior is to group scan ranges according to region and use an HBase MultiRowRangeFilter
bcakir
@bcakir:matrix.org
[m]
Can we send queries to geomesa other than geoserver plugin to cassandra backend ?
1 reply
JB-data
@JB-data
so far I was using the geomesa.z.splits feature only and did not worry about shards.
I see if I increase this from 4 to a higher number (20), the number of regions for my z2/z3 indexes will be this higher number (20).
I see clearly my performance is much better using more coprocessor threads like around 20 (but maxes out and no diff when going to higher numbers), and was thinking this was thanks to the fact I had a higher number of splits.
BUt I also saw that taking a much higher number (like 60) and much higher number of coprocessors (like 60) did not make it even better.
Emilio
@elahrvivaz
depending on the version of geomesa you're using, you might try upgrading to a newer one. we've made some improvements to the coprocessor threading model in (i think) the 3.0 release
4 replies
bcakir
@bcakir:matrix.org
[m]
Thanks Emilio
1 reply
efvaldez1
@efvaldez1
Hello everyone. I am trying to install geomesa_pyspark in my Python Project in Google Colab. I could still not be able to install it correctly the whole afternoon. Can someone please guide me? Thank you for your time
1 reply
Kolya Yanchiy
@gokareless_twitter

Hi all!
I'm trying to access Geomesa with Accumulo datastore using client (2.4.2 version)
In particular I'm using an API for schema creation. I'm getting:

Exception in thread "main" java.lang.RuntimeException: Could not acquire distributed lock at '/org.locationtech.geomesa/ds/local' within 2 minutes
    at org.locationtech.geomesa.index.geotools.MetadataBackedDataStore.$anonfun$acquireCatalogLock$2(MetadataBackedDataStore.scala:399)
    at scala.Option.getOrElse(Option.scala:189)

Any idea on what might caused that? Or pls give me some routes on how to troubleshoot it

27 replies
Kolya Yanchiy
@gokareless_twitter
image.png
5 replies