Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
jrs53
@jrs53

also, you wanted like to work without any wildcards right? that should be fixed in 2.4.0: locationtech/geomesa#2351

:-)

Emilio
@elahrvivaz
we have a new geojson method now, so there are 2 of them, you might have the wrong one in scope possibly
that's the new one
jrs53
@jrs53
@elahrvivaz that's the one I am using
Emilio
@elahrvivaz
oh wait, that's the old one
are you on a snapshot?
jrs53
@jrs53
no, 2.3.1
Emilio
@elahrvivaz
oh, maybe you need to remove the ()
jrs53
@jrs53
I do indeed - thanks!
Emilio
@elahrvivaz
:+1:
jrs53
@jrs53
seem to have pretty much everything working in jupyer - apart from st_DistanceSpheroid :-(
despite SQLTypes.init(spark.sqlContext)
Emilio
@elahrvivaz
hmm, that's odd
other st_ functions are working?
jrs53
@jrs53
yup
weird, st_distanceSphere works but st_distanceSpheroid doesn't
Emilio
@elahrvivaz
do you have gt-referencing on the classpath?
actually both of them use the same function...
so you probably do have it on the classpath
jrs53
@jrs53
it looks like the udf isn't registered
Emilio
@elahrvivaz
hmm, weird, the SQLTypes.init should do that
jrs53
@jrs53
running in jupyter with toree so may be doing weird stuff
with my sql context
Emilio
@elahrvivaz
ah, yeah i vaguely recall something about that...
jrs53
@jrs53
With geomesa-fs, any particular reason a geospatial join (st_within($"ships.geom", st_bufferPoint($"interesting.geom", 500)) would be sooooo much slower than converting the points to geohashes and doing equality comparison?
I really ought to seriously look at in memory indexing (by which I may mean @jg895512 ought to...)
Emilio
@elahrvivaz
the main way we have to speed up spatial joins is to spatially partition both sides, then join each corresponding pair of partitions
but if you're buffering, then you might miss things that are near a partition edge
jrs53
@jrs53
I'm only looking at 1295 x 269899 records
Emilio
@elahrvivaz
I imagine that spark is doing something smart with the equality comparison
jrs53
@jrs53
and the small one is broadcast
Emilio
@elahrvivaz
hmm, well that shouldn't be so bad
did you try the spark explain?
it could just be that querying for 1300 small polygons is slow in FSDS
jrs53
@jrs53
counting the small polys is pretty quick
Emilio
@elahrvivaz
i mean querying the larger dataset with an intersects based on the 1300 smaller ones
I have 76M AIS records, I filter for all records within 50km of a point
for those records, I filter for a specific ship name
and then I look to see which ships (in the 50km radius circle) have been near any of the points of the ship of interest
Emilio
@elahrvivaz
yeah, i think it's probably not going to handle that polygon query on the last few lines very well
in a real db, that would break down much nicer i think
jrs53
@jrs53
I can try it on Accumulo later
Emilio
@elahrvivaz
you can probably turn up the geomesa fs logging and might see more
it's possible that the slowness is from our query planning code, trying to figure out intersecting partitions
and/or simplifying filters based on the partition
jrs53
@jrs53
a lot of my data is very spatially concentrated, so only single partition in fs
Emilio
@elahrvivaz
ah, you might want to increase the number of bits in the spatial partition scheme then