Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Emilio
@elahrvivaz
oh, maybe you need to remove the ()
jrs53
@jrs53
I do indeed - thanks!
Emilio
@elahrvivaz
:+1:
jrs53
@jrs53
seem to have pretty much everything working in jupyer - apart from st_DistanceSpheroid :-(
despite SQLTypes.init(spark.sqlContext)
Emilio
@elahrvivaz
hmm, that's odd
other st_ functions are working?
jrs53
@jrs53
yup
weird, st_distanceSphere works but st_distanceSpheroid doesn't
Emilio
@elahrvivaz
do you have gt-referencing on the classpath?
actually both of them use the same function...
so you probably do have it on the classpath
jrs53
@jrs53
it looks like the udf isn't registered
Emilio
@elahrvivaz
hmm, weird, the SQLTypes.init should do that
jrs53
@jrs53
running in jupyter with toree so may be doing weird stuff
with my sql context
Emilio
@elahrvivaz
ah, yeah i vaguely recall something about that...
jrs53
@jrs53
With geomesa-fs, any particular reason a geospatial join (st_within($"ships.geom", st_bufferPoint($"interesting.geom", 500)) would be sooooo much slower than converting the points to geohashes and doing equality comparison?
I really ought to seriously look at in memory indexing (by which I may mean @jg895512 ought to...)
Emilio
@elahrvivaz
the main way we have to speed up spatial joins is to spatially partition both sides, then join each corresponding pair of partitions
but if you're buffering, then you might miss things that are near a partition edge
jrs53
@jrs53
I'm only looking at 1295 x 269899 records
Emilio
@elahrvivaz
I imagine that spark is doing something smart with the equality comparison
jrs53
@jrs53
and the small one is broadcast
Emilio
@elahrvivaz
hmm, well that shouldn't be so bad
did you try the spark explain?
it could just be that querying for 1300 small polygons is slow in FSDS
jrs53
@jrs53
counting the small polys is pretty quick
Emilio
@elahrvivaz
i mean querying the larger dataset with an intersects based on the 1300 smaller ones
I have 76M AIS records, I filter for all records within 50km of a point
for those records, I filter for a specific ship name
and then I look to see which ships (in the 50km radius circle) have been near any of the points of the ship of interest
Emilio
@elahrvivaz
yeah, i think it's probably not going to handle that polygon query on the last few lines very well
in a real db, that would break down much nicer i think
jrs53
@jrs53
I can try it on Accumulo later
Emilio
@elahrvivaz
you can probably turn up the geomesa fs logging and might see more
it's possible that the slowness is from our query planning code, trying to figure out intersecting partitions
and/or simplifying filters based on the partition
jrs53
@jrs53
a lot of my data is very spatially concentrated, so only single partition in fs
Emilio
@elahrvivaz
ah, you might want to increase the number of bits in the spatial partition scheme then
Lee Robb
@lrobb
Would I be correct with my assumption that I could add an st_intersection function wrapping jts intersects in SpatialRelationFunctions.scala?
... to make it available to pyspark...
Emilio
@elahrvivaz
you want intersection not intersects right?
you should be able to add it there
Lee Robb
@lrobb
d'oh... yes, intersection
James Hughes
@jnh5y
@lrobb yes! GeoMesa is wrapping JTS functions for most things
there are a few geodetic calculations from Spatial4J and maybe one or two things from GeoTools
If there's something more that should be happening there, I'm involved with JTS and we could work through getting something into the right projects, etc
Kaiqing Wang
@WangHarvey

Hi @elahrvivaz there are some questions about Statistical Queries:
Q1: when my geometry type is LineString, using the Count() methods to count the queried data, the count num seems right with 10, but why there is a suffix with “POINT (0 0)”, this problem also happened in other stat Queries.

query.getHints().put(QueryHints.STATS_STRING(), "Count()");
// result
stat={"count":10}|POINT (0 0)

Q2:using MinMax(geom) to calculate min/max data for the following point, but the result seems return the min(minLon minLat)/max(maxLon maxLat) point for the BBOX, is this method works for geometry type data? Or the usage limitation for MinMax(attr)?

// Point
POINT (132.1773 20.1885)
POINT (131.1773 21.1885)
POINT (130.1773 22.1885)
POINT (129.1773 23.1885)
POINT (128.1773 24.1885)
POINT (127.1773 25.1885)
POINT (126.1773 26.1885)
POINT (125.1773 27.1885)
POINT (124.1773 28.1885)
POINT (123.1773 29.1885)

//query
query.getHints().put(QueryHints.STATS_STRING(), "MinMax(geom)");
//result
stat={"min":"POINT (123.1773 20.1885)","max":"POINT (132.1773 29.1885)","cardinality":10}|POINT (0 0)

Q3 The range for integer value ‘age’ is 1~100, while using TopK(age) to query the topK values, we expect the result is 91~100, but stat query return the following result, not inorder and also not TopK values. And can we set this K?

//query
query.getHints().put(QueryHints.STATS_STRING(), "TopK(age)");
//result
stat={"0":{"value":73,"count":1},"1":{"value":100,"count":1},"2":{"value":24,"count":1},"3":{"value":90,"count":1},"4":{"value":1,"count":1},"5":{"value":79,"count":1},"6":{"value":2,"count":1},"7":{"value":11,"count":1},"8":{"value":8,"count":1},"9":{"value":76,"count":1}}|POINT (0 0)

Q4 TopK question for geometry value. Does this method support Geometry value? Testing with point data, from the result we cannot find the Top rules for it.

query.getHints().put(QueryHints.STATS_STRING(), "TopK(geom)");

Q5 For Frequency("attribute","dtg",<time period>,<precision>) seems ‘time period’ and ‘precision’ is not necessary for this method. Is there any detailed instruction for stat Query. The User Manual seems to simple.

Best Regards. Thanks for your time.