by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
James Hughes
@jnh5y
  1. The quickstart is missing some information from the hbase-site.xml to tie together where to look for the coprocessors. (I ended up tossing the hbase-site.xml into the quickstart jar using jar -uf)
    1. The quickstart was originally designed to be used against a local HBase cluster (rather than one in AWS using S3). As such, the quickstart does not have any of the AWS S3 libraries on the path.
Chad Thompson
@chadothompson
(To finish my thought - the run command does not upload data.)
James Hughes
@jnh5y
That means when HBase tries to see if s3://<path>/file is valid, it needs S3 jars (even though it isn't going to read the jar:()
So, second, one would have to add the 'right' S3 jar to the quickstart classpath. That took some fiddling.
Those two things are achievable, but are admittedly, a pain to deal with.
Chad Thompson
@chadothompson
Ah, okay. Thank you @jnh5y - that is extraordinarily helpful.
James Hughes
@jnh5y
An alternative to the GeoMesa quickstart is to use the geomesa command line tools
happy to help.
like I said, I basically hit this exact issue on Friday:)
I'd recommend trying out the geomesa-hbase command line tools. Those ship with 'example' data which can be used very, very easily.
Chad Thompson
@chadothompson
That was going to be my next question - is the better approach to use the 'gdelt' load of the prior version for quick testing. (It sounds like it.)
James Hughes
@jnh5y
There some scripts to download and ingest 'real' data. Although, sometimes the way to gather that data changes out from under the project:(
Chad Thompson
@chadothompson
Plus, it's nice to know that I'm on something of the right path.
James Hughes
@jnh5y
The third thing I'll throw out there is that EMR 6.0 is brand new! Which means this is the first that EMR has used HBase 2.x!
Chad Thompson
@chadothompson
Yeah - I did find that with the prior tutorial that loaded GDELT data - the S3 bucket seems to be configured differently. It now returns 403 errors, but using an HTTP / wget method is just fine.
James Hughes
@jnh5y
ah, cool
@elahrvivaz and I tested GeoMesa 3.0.0 with HBase 1.4 and 2.x, but EMR 6.0 wasn't out at that point.
We have used EMR 5.x operationally with HBase 1.4.x. We have not used EMR 6.0 or HBase 2.x yet.
So yeah, if you hit issues, definitely let us know here and/or on the GeoMesa JIRA.
Chad Thompson
@chadothompson
Nice! To this point it seems to work (I thought I'd try the latest versions with GeoMesa 3.0.0), if I can load data.
Will certainly do so - thank you!
James Hughes
@jnh5y
And if you succeed (even a little bit), it'd be great to hear about that as well
Chad Thompson
@chadothompson
I'll let you know how it goes here. Thanks again!
James Hughes
@jnh5y
('Cause I only get to see CCRi use part of GeoMesa here and there; there are plenty of options for how to deploy things, etc.)
Awesome!
Chad Thompson
@chadothompson
FWIW, preliminary load of data using the command line tools (following a modified version of this: https://www.geomesa.org/documentation/2.4.1/tutorials/geomesa-hbase-s3-on-aws.html?highlight=gdelt%20hbase )
Appears to work (loaded data for August):
[root@ip-10-96-88-34 gdelt]# geomesa-hbase ingest -c geomesa.gdelt -C gdelt -f gdelt -s gdelt \*.CSV SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/geomesa-hbase_2.11-3.0.0/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] INFO Schema 'gdelt' exists INFO Running ingestion in local mode INFO Ingesting 11 files with 1 thread [============================================================] 100% complete 1123833 ingested 162017 failed in 00:02:18 INFO Local ingestion complete in 00:02:18 INFO Ingested 1123833 features and failed to ingest 162017 features
(Ran a 'local ingest' as a first test.)
James Hughes
@jnh5y
nice!
Joel
@jafolkerts_twitter

Cool; good luck. I tend to watch the region server logs when things start looking dicey

Thanks @jnh5y - problem turned out a bad co-processor deployment.

James Hughes
@jnh5y
@jafolkerts_twitter whew.... anything you learned that's worth sharing?
Joel
@jafolkerts_twitter
Yep - if you run into any type of stats issue, first ensure the hbase co-processor is properly deployed (hdfs in my case) and referenced in the geomesa-site.xml(mine was not)
James Hughes
@jnh5y
Nice
Joel
@jafolkerts_twitter
Running into another issue (similar to this: https://bit.ly/3ivxuCU) that is preventing me from ingesting from HDFS on a kerborized cluster:
geomesa-hbase ingest -c geomesa -f global --converter foo.conf --spec foo.conf hdfs://NAME_NODE:8020/geomesa_ingest/file.gz
INFO  Schema 'global' exists
INFO  Running ingestion in distributed mode
INFO  Submitting job - please wait...
ERROR Can't get Master Kerberos principal for use as renewer
java.io.IOException: Can't get Master Kerberos principal for use as renewer
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:132)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:170)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
        at org.locationtech.geomesa.tools.ingest.AbstractIngestJob.run(AbstractIngestJob.scala:53)
        at org.locationtech.geomesa.tools.ingest.DistributedConverterIngest.runIngest(DistributedConverterIngest.scala:48)
        at org.locationtech.geomesa.tools.ingest.AbstractConverterIngest.run(AbstractConverterIngest.scala:40)
        at org.locationtech.geomesa.tools.ingest.IngestCommand$$anonfun$execute$2.apply(IngestCommand.scala:106)
        at org.locationtech.geomesa.tools.ingest.IngestCommand$$anonfun$execute$2.apply(IngestCommand.scala:105)
        at scala.Option.foreach(Option.scala:257)
        at org.locationtech.geomesa.tools.ingest.IngestCommand$class.execute(IngestCommand.scala:105)
        at org.locationtech.geomesa.hbase.tools.HBaseRunner$$anon$2.execute(HBaseRunner.scala:32)
        at org.locationtech.geomesa.tools.Runner$class.main(Runner.scala:28)
        at org.locationtech.geomesa.hbase.tools.HBaseRunner$.main(HBaseRunner.scala:17)
        at org.locationtech.geomesa.hbase.tools.HBaseRunner.main(HBaseRunner.scala)
I can ingest locally w/o issue
I have tried kinit'ing with HDFS and HBase principals and ensure that the hdfs perms are good
James Hughes
@jnh5y
oh man....
Joel
@jafolkerts_twitter
lol, that's what I was afraid of ;)
James Hughes
@jnh5y
Kerberos is always a little tough to sort out (provided it isn't working perfectly)
that said, here's a quick note.... since you gave a remote path, GeoMesa is trying to do a distributed ingest
if you can download the file locally, you may end up with a slightly different code path (which may work better)
at the very least, you won't be fighting HBase + Hadoop + Kerberos + MapReduce
I found this option helpful when debugging Kerberos: export JAVA_OPTS="-Dsun.security.krb5.debug=true"
Joel
@jafolkerts_twitter
All good tips. I'll give those a shot. Thank you!!
James Hughes
@jnh5y
You've probably already found the documentation here: https://www.geomesa.org/documentation/stable/user/hbase/kerberos.html
Joel
@jafolkerts_twitter
Yep - but I'll review it to make sure we're doing everything correct
James Hughes
@jnh5y
which mentions the two keys GeoMesa is looking for: "hbase.geomesa.keytab" and "hbase.geomesa.principal"
after that comes attaching a debugger and making a nice cup of tea (in some order)