These are chat archives for locationtech/geomesa

20th
Apr 2017
geoHeil
@geoHeil
Apr 20 2017 05:22

@elahrvivaz thanks, I was missing this small update. I have an additional question regarding the sql UDF:

How can I access geomesas UDF in spark scala dataframe (not textual) api? I.e. how to convert

spark.sql("select st_asText(st_bufferPoint(geom,10)) from chicago where case_number = 1")

to

df.select(st_asText(st_bufferPoint('geom, 10))).filter('case_number === 1)

which means how to register geomesas UDF in a way that these are not only available to the sql text mode. SQLTypes.init(spark.sqlContext) only seems to do that.

Yang
@yang074
Apr 20 2017 09:11
Anyone know why this issue happened when I build geomesa?What things may caused this test failed? There are test failures when build geomesa-accumulo-datastore_2.11, like BinConversionProcessTest.BinConversionProcess test
Yang
@yang074
Apr 20 2017 09:23
brief result is :BinConversionProcess... should encode ...is missing:... must not contain:..
Emilio
@elahrvivaz
Apr 20 2017 13:08
@yang074 some of our tests fail occasionally... we've been trying to track them down and squash them all
feel free to use -DskipTests, or to just resume the build and it will likely pass again
@geoHeil I'm not sure about that, maybe @jnh5y or @anthonyccri would know
geoHeil
@geoHeil
Apr 20 2017 14:57
The second one seems to work with call UDF but ist not really nice.
Maybe it would be possible to keep the references as suggested py the SF poster around?
James Hughes
@jnh5y
Apr 20 2017 15:00
(not sure, I'm in an all day meeting, so I won't be able to look at it for a bit:()
David Lewis
@blyncsy-david-lewis
Apr 20 2017 16:50
@anthonyccri I just realized that my tests were incorrect, let me run them again right now to see if your changes worked
Anthony Fox
@anthonyccri
Apr 20 2017 17:03
Are you on hbase or bigtable?
I've verified bigtable but not hbase yet.
David Lewis
@blyncsy-david-lewis
Apr 20 2017 17:06
I'm on HBase
I had a bug in my scripts, so my test yesterday was still running the old code
I fixed that bug and am running your code as of last night, and I still see only about half of the records
I use bigtable in production, but HBase locally for testing
(because the bigtable emulator wouldn't work for me when I was setting it up last year)
Anthony Fox
@anthonyccri
Apr 20 2017 17:10
ok, the geomesa query paths are differnt
bc on hbase we can push down filters and computation
David Lewis
@blyncsy-david-lewis
Apr 20 2017 17:12
I see, is this BigtableSparkRDDProvider what you've added in your branch?
Anthony Fox
@anthonyccri
Apr 20 2017 17:12
yes, partially
it's also the use of BigtableExtendedScan vs. HBase's MultiRangeRowFilter
David Lewis
@blyncsy-david-lewis
Apr 20 2017 17:16
i see
James Hughes
@jnh5y
Apr 20 2017 17:17
Ah, if you aren't using Bigtable, then the code I was pointing to might be important to sort through
(Sadly, I'm out of the office or I'd try and knock something out)
David Lewis
@blyncsy-david-lewis
Apr 20 2017 17:21
No worries, thanks for the help!
Anthony Fox
@anthonyccri
Apr 20 2017 18:11
@jnh5y can you give me more details about what you're suggesting?
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:12
I looked at that code @jnh5y, and it's being used by both the BIgTable path and the HBase path, so if the bigtable is working then I don't think that's the problem
Anthony Fox
@anthonyccri
Apr 20 2017 18:12
yeah, @blyncsy-david-lewis i just did the same thing
Emilio
@elahrvivaz
Apr 20 2017 18:12
aren't both bigtable and hbase showing the same half feature issue now though?
Anthony Fox
@anthonyccri
Apr 20 2017 18:12
no, just hbase
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:13
(assuming my testing methodology is correct)
Emilio
@elahrvivaz
Apr 20 2017 18:13
oh, so half-way fixed :)
Anthony Fox
@anthonyccri
Apr 20 2017 18:13
@blyncsy-david-lewis i just verified the issue still exists in hbase
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:14
Alright, thanks! good to hear... sort of...
@anthonyccri in the HBaseSpatialRDDProvider, why do we force loose bbox to be false?
Anthony Fox
@anthonyccri
Apr 20 2017 18:17
@blyncsy-david-lewis we figure that analytic queries want to be precise
loose bbox is an optimization typically relevant for visualization
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:17
i see
Anthony Fox
@anthonyccri
Apr 20 2017 18:17
ok, here's what i did to fix bigtable
basically, i exhaust the current Result by calling advance
so, in hbase, that's not being down
done
so, either there's a bug in hbase or we are somehow misconfiguring our scans
to return more than one value in a Result
seems unlikely that there is a bug in hbase
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:22
agreed
Emilio
@elahrvivaz
Apr 20 2017 18:24
oh, maybe more than 1 cell is getting returned in a Result
Anthony Fox
@anthonyccri
Apr 20 2017 18:24
yeah, but why would that happen
Emilio
@elahrvivaz
Apr 20 2017 18:25
just as an optimization or something?
each cell has it's own row
so technically you could have more than one row in a Result
i think...
Anthony Fox
@anthonyccri
Apr 20 2017 18:25
yeah, but then why would their implementation of tablerecordreader not handle that?
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:25
that's the nature of Result, it can hold more than 1 cell
though I think only 1 row
Emilio
@elahrvivaz
Apr 20 2017 18:26
oh, well it does say Single row result of a {@link Get} or {@link Scan} query
Anthony Fox
@anthonyccri
Apr 20 2017 18:26
hmm, so we only have one record per row
Emilio
@elahrvivaz
Apr 20 2017 18:26
  • A Result is backed by an array of {@link Cell} objects, each representing
  • an HBase cell defined by the row, family, qualifier, timestamp, and value.<p>
Emilio
@elahrvivaz
Apr 20 2017 18:32
hmm, dunno looking through the source code it does seem like it should return a single row
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:32
the handling of secondary indexes is slightly different between hbase and bigtable
hbase sets the filter using qp.filter.secondary (optionally)
and bigtable sets the filter using qp.filter
Emilio
@elahrvivaz
Apr 20 2017 18:34
hmm, that shouldn't affect the end result, although we should probably fix it
Anthony Fox
@anthonyccri
Apr 20 2017 18:34
well, that's because bigtable can't push down filters
whereas we can push them down in hbase
Emilio
@elahrvivaz
Apr 20 2017 18:34
ah, right
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:37
i see
Emilio
@elahrvivaz
Apr 20 2017 18:41
actually it should still use qp.filter.secondary unless it's a z-index query and not loose bbox
Anthony Fox
@anthonyccri
Apr 20 2017 18:42
@elahrvivaz please remind me about that a little later
Emilio
@elahrvivaz
Apr 20 2017 18:42
k
Anthony Fox
@anthonyccri
Apr 20 2017 18:42
don't have the mental space to grok it yet
Emilio
@elahrvivaz
Apr 20 2017 18:42
haha, np
i'll try to remember for the final pr
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:51
hey, have a loko at GeoMesaHBaseInputFormat... line 130
that while loop
should we switch the ordering of the and?
while (reader.nextKeyValue() && staged == null) ... to while (staged == null && reader.nextKeyValue()) ...
Anthony Fox
@anthonyccri
Apr 20 2017 18:53
hmm, i think that would do it
testing now
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:54
cool, same
Anthony Fox
@anthonyccri
Apr 20 2017 18:56
yes!!!
got it
@blyncsy-david-lewis good catch!
and @jnh5y
side effects!
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:57
sweet!
Anthony Fox
@anthonyccri
Apr 20 2017 18:57
@blyncsy-david-lewis i pushed to that branch
David Lewis
@blyncsy-david-lewis
Apr 20 2017 18:59
cool, thanks, I'll merge that in
Emilio
@elahrvivaz
Apr 20 2017 19:20
nice!
David Lewis
@blyncsy-david-lewis
Apr 20 2017 19:55
a little late, but so you know all my tests pass too!
Anthony Fox
@anthonyccri
Apr 20 2017 19:55
excellent, another data point
James Hughes
@jnh5y
Apr 20 2017 22:16
ah... I was 'close'... I hadn't identified that nextFeature had a side-effect quite as clearly as @blyncsy-david-lewis did...
so yeah, most all the points should go to him...