Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jun 21 16:42
    ericsun95 commented #516
  • Jun 19 03:29
    jiayuasu commented #516
  • Jun 18 15:27
    netanel246 commented #516
  • Jun 18 15:26
    netanel246 commented #516
  • Jun 17 14:24
    corilei starred apache/incubator-sedona
  • Jun 17 09:36

    jiayuasu on master

    [SEDONA-30] Add raster data sup… (compare)

  • Jun 17 09:36
    jiayuasu closed #523
  • Jun 17 09:34
    jiayuasu synchronize #523
  • Jun 17 07:25
    jiayuasu edited #523
  • Jun 17 07:24
    jiayuasu synchronize #523
  • Jun 16 00:18
    jiayuasu commented #516
  • Jun 14 21:59
    mathias7777 starred apache/incubator-sedona
  • Jun 14 21:59
    co0lster starred apache/incubator-sedona
  • Jun 14 20:29
    netanel246 synchronize #516
  • Jun 14 20:29

    netanel246 on Sedona-17_Shape&WKBSerDe

    Fixed two tests: 1. ST_Intersec… (compare)

  • Jun 14 20:13
    pbylicki starred apache/incubator-sedona
  • Jun 14 17:37
    yitao-li commented #530
  • Jun 13 19:21
    netanel246 commented #516
  • Jun 13 19:13
    netanel246 commented #516
  • Jun 13 01:43

    jiayuasu on master

    [SEDONA-50] Removing logging co… (compare)

Jia Yu
@jiayuasu
The new one is here
James Kyle
@jameskyle

Heya, I was going through the examples here -> https://sedona.apache.org/tutorial/rdd/. But I'm noticing they're a bit out of data with the current source release.

Are there some current tutorials or example notebooks anyone is aware of?

James Kyle
@jameskyle
Ok, I think I'm close to getting a working spatial join on two datasets going. But when the spark job executes I get
 java.lang.IllegalArgumentException: This method does not support GeometryCollection arguments
Darshil Desai
@darshdee
@jameskyle maybe chat in the other channel but are you working with databricks?
James Kyle
@jameskyle
I am in databricks
@darshdee oops, am I in the "deprecated" channel? :P
Ashwini Kumar Padhy
@Akpadhy

Any suggestion how to run this in spark sql

SELECT osm_id
FROM geom
WHERE
st_intersects(ST_Transform(ST_GeomFromText(polygon, 4326), 3857), way::geometry)
AND
building is not NULL

akirti
@akirti

I have the geojson polygon data. This data I am getting from ArcMap layer and assiging to a variable through this I am creating an RDD. By this RDD i am trying to create a Spatial RDD but I am getting the following issue.
java.lang.ClassCastException: [B cannot be cast to java.lang.String
at org.datasyslab.geospark.formatMapper.FormatMapper.call(FormatMapper.java:387)

def create_servie_data_rdd(self,spark : SparkSession,site,
layer_suffix, service_query:LayerQuery,
token, payload, useragnet ,
referer, ftype, **kwargs):

    #self.logger.info("method calling # {}".format('get_servie_data'))
    self.set_Logging(spark)
    ac = ArcServer()
    j_content = ac.fetch_layer_data_full(site, layer_suffix, 
                           token, payload, useragnet ,
                           referer,ftype,service_query, **kwargs)

    if j_content is not None and 'features' in j_content.keys():
        features = j_content['features']
        if features is not None and len(features) >0:
            sigle = features[0]   
            if spark is not None:
                try:
                    print(spark)
                    sc = self.create_session_context(spark)
                    sqlContext = SQLContext(sc)
                    sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")
                    jsonStrings = [str(json.dumps(x)) for x in features ]
                    #print(jsonStrings)
                    jsonRDD = sc.parallelize(jsonStrings)
                    #jsonDF = sqlContext.read.json(jsonRDD)
                    #jsonDF.show()

                    '''
                    df = spark.createDataFrame(Row(**x) for x in features)
                    df.write.parquet("features.parquet").show(truncate=False)
                    print("---------XXXXXXXXXXXXXXXX-----------")
                    print(df)
                    #df.show(2)
                    '''
                    #with jsonRDD as rd:
                    allowInvalidGeometries = True
                    skipSyntacticallyInvalidGeometries = True
                    spatialRDD = GeoJsonReader.readToGeometryRDD(jsonRDD)
                    spatialRDD.analyze() 
                    print(spatialRDD.stats())

                except Exception as e:
                    print(e)
                finally:    
                    spark.stop()
akirti
@akirti

Error I am getting

java.lang.ClassCastException: [B cannot be cast to java.lang.String
at org.datasyslab.geospark.formatMapper.FormatMapper.call(FormatMapper.java:387)

JSON has few null values. I tried assigning some default float values to them still Same issue I am getting
mst94
@mst94
Hello I have a problem with using geospark with the 1.3.2 snapshot version and spark 3.0.1. During submission, it fires a classNotFoundExpection, but all dependencies are there in the uber jar (changing gradle depencendies of spark to "compile only" did not have an effect). I've already created a topic regarding my problem at stackoverflow, but until now, nobody could help me. If you are ok with it, I just refer to my stackoverflow question topic: https://stackoverflow.com/questions/65703387/apache-sedona-geospark-sql-with-java-classnotfoundexception-during-sql-statem I am really hoping some of you could help me here. Using spark 2.3 and the corresponding geospark version seems working. But it is important for me using spark 3.0.1 . Thanks in advance!
Jia Yu
@jiayuasu
@/all Dear all, I am happy to announce that Apache Sedona 1.0.0-incubating has been released. It supports Spark 2.3 - 3.0, Scala 2.11 - 2.12, Java 1.8, Python 3.7 - 3.9. Quick start: http://sedona.apache.org/download/overview/ Release notes: http://sedona.apache.org/download/GeoSpark-All-Modules-Release-notes/
1 reply
The new Sedona Gitter chat is here: https://gitter.im/apache/sedona
vasquezk26
@vasquezk26

Hi,

I'm getting a similar error: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData

na_rdd = ShapefileReader.readToGeometryRDD(sc, "/prod/data/workingset/cms/files/radius_of_operation/new_shapefiles/north_america_all")

Adapter.toDf(na_rdd, spark).createOrReplaceTempView("na_df")

vasquezk26
@vasquezk26
spark.sql("""WITH gps_traces AS(
    SELECT 
    gtrips.trip_id
    , to_date(gtrips.trip_date) as trip_date
    , gtrips.fleet_id
    , vin.vehicle_vin
    , gtrips.driver_id
    , gtrips.trip_distance_travelled
    , gtrips.trip_duration
    , to_timestamp(gdata.trip_timestamp, "yyyy-MM-dd'T'HH:mm:ss") as gps_timestamp
    , rank() over 
        (partition by gtrips.trip_id 
            order by to_timestamp(gdata.trip_timestamp, "yyyy-MM-dd'T'HH:mm:ss") asc) 
        as timestamp_rank
    , gdata.latitude
    , gdata.longitude
    , gdata.postcode
    -- Need to make the lat/long into a decimals so that we can make them into points for reverse geo-coding in spark
    , CAST(ST_Point(CAST(gdata.longitude AS DECIMAL(16,12)), CAST(gdata.latitude AS DECIMAL(16,12))) AS STRING) AS geometry
    FROM 
        cms.gps_trips gtrips
    INNER JOIN
        cms.gps_data gdata
        ON gtrips.trip_id = gdata.trip_id
    INNER JOIN 
    -- Tying in the vehicle for a given trip.
        (
            SELECT
                DISTINCT --why are there duplicates?
                    devices.vehicle_id
                    , devices.vehicle_vin
                    , devices.data_effective_timestamp
            FROM
                cms.devices devices
            INNER JOIN
                 (
                    SELECT
                        vehicle_id
                        , max(data_effective_timestamp) as data_effective_timestamp
                    FROM
                        cms.devices
                    GROUP BY
                        vehicle_id
                ) max_data_effective
                ON devices.vehicle_id = max_data_effective.vehicle_id
                AND devices.data_effective_timestamp = max_data_effective.data_effective_timestamp
        ) vin
        ON gtrips.vehicle_id = vin.vehicle_id
    WHERE 
    --Rolling 7 days
        to_date(gtrips.trip_date) between date_sub(current_date, 8) and date_sub(current_date, 2)
    )
-- Spark table incorporated---    
, na_geom as (
    SELECT
        ST_GeomFromWKT(geometry) as geometry
        , NAME_LONG
    FROM
        na_df 
    )
, gps_data_adj as(
    SELECT 
    gps.trip_id
    , gps.trip_date
    , gps.fleet_id
    , gps.gps_timestamp
    , gps.latitude
    , gps.longitude
    , gps.postcode
    , gps.trip_distance_travelled
    , gps.trip_duration
    , gps.geometry
        , ACOS(
            SIN(RADIANS(gps.latitude))*SIN(RADIANS(gps1.latitude)) + 
            COS(RADIANS(gps.latitude))*COS(RADIANS(gps1.latitude))*COS(RADIANS(gps1.longitude) - RADIANS(gps.longitude))
        )*3958.76 AS COSINES_DISTANCE
        , ASIN(
            SQRT(
                POWER(SIN((RADIANS(gps.latitude) - RADIANS(gps1.latitude))/2), 2) +
                COS(RADIANS(gps.latitude))*COS(RADIANS(gps1.latitude))*
                POWER(SIN((RADIANS(gps.longitude) - RADIANS(gps1.longitude))/2), 2)
            )
        )*3958.76*2 AS HAVERSINE_DISTANCE 
        , (UNIX_TIMESTAMP(gps1.gps_timestamp) - UNIX_TIMESTAMP(gps.gps_timestamp)) AS GPS_INTERVAL
    FROM
        gps_traces gps
    LEFT JOIN
        gps_traces gps1
            ON gps.trip_id = gps1.trip_id
            AND gps.timestamp_rank = (gps1.timestamp_rank - 1)
    )
, gps_data_wona as (



    select
        gps_data_adj.trip_id
        , gps_data_adj.trip_date
        , gps_data_adj.fleet_id
        , gps_data_adj.gps_timestamp
        , gps_data_adj.latitude
        , gps_data_adj.longitude
        , gps_data_adj.postcode
        , gps_data_adj.trip_distance_travelled
        , gps_data_adj.trip_duration
        , gps_data_adj.geometry
        , trip_summary.TRIP_HAVERSINE_DISTANCE
        , trip_summary.TRIP_GPS_DURATION
        , gps_data_adj.HAVERSINE_DISTANCE
        , gps_data_adj.GPS_INTERVAL
        , gps_data_adj.HAVERSINE_DISTANCE/trip_summary.TRIP_HAVERSINE_DISTANCE AS HAVERSINE_DISTANCE_FRACTION
        , gps_data_adj.GPS_INTERVAL/trip_summary.TRIP_GPS_DURATION AS GPS_INTERVAL_FRACTION
        , (gps_data_adj.HAVERSINE_DISTANCE/trip_summary.TRIP_HAVERSINE_DISTANCE)*gps_data_adj.trip_distance_travelled AS HAVERSINE_DISTANCE_ADJ
        , (gps_data_adj.GPS_INTERVAL/trip_summary.TRIP_GPS_DURATION)*gps_data_adj.trip_duration AS GPS_INTERVAL_ADJ
    FROM
        gps_data_adj
    INNER JOIN
        (
            SELECT
                trip_id 
                , sum(COSINES_DISTANCE) as TRIP_COSINES_DISTANCE
                , sum(HAVERSINE_DISTANCE) as TRIP_HAVERSINE_DISTANCE
                , sum(GPS_INTERVAL) AS TRIP_GPS_DURATION
            FROM
                gps_data_adj
            GROUP BY
                trip_id
        ) trip_summary
on gps_data_adj.trip_id = trip_summary.trip_id
    )
select 
 STRING(gps_data_wona.trip_id)
, STRING(gps_data_wona.trip_date)
, STRING(gps_data_wona.gps_timestamp)
, STRING(gps_data_wona.latitude)
, STRING(gps_data_wona.longitude)
, STRING(gps_data_wona.postcode)
, STRING(gps_data_wona.trip_distance_travelled)
, STRING(gps_data_wona.trip_duration)
, STRING(gps_data_wona.TRIP_HAVERSINE_DISTANCE)
, STRING(gps_data_wona.TRIP_GPS_DURATION)
, STRING(gps_data_wona.HAVERSINE_DISTANCE)
, STRING(gps_data_wona.GPS_INTERVAL)
, STRING(gps_data_wona.HAVERSINE_DISTANCE_FRACTION)
, STRING(gps_data_wona.GPS_INTERVAL_FRACTION)
, STRING(gps_data_wona.HAVERSINE_DISTANCE_ADJ)
, STRING(gps_data_wona.GPS_INTERVAL_ADJ)
, CASE
            WHEN gps_data_wona.postcode RLIKE "[A-Z]{1}[0-9]{1}[A-Z]{1}"
                THEN "Canada"
            ELSE
                CASE
                    WHEN gps_data_wona.latitude >= 33.62116425145008 OR gps_data_wona.longitude <= -119.9522148603999 OR gps_data_wona.longitude >= -85.11718096959993
                    THEN "United States"
                    ELSE na_geom.NAME_LONG
                END
        END AS COUNTRY
    FROM
        gps_data_wona
        LEFT JOIN
        na_geom
            on 
         ST_Intersects(gps_data_wona.geometry, na_geom.geometry)
        --Filters
        AND gps_data_wona.postcode NOT RLIKE "[A-Z]{1}[0-9]{1}[A-Z]{1}"
        AND gps_data_wona.longitude > -119.9522148603999
        AND gps_data_wona.longitude < -85.11718096959993
        AND gps_data_wona.latitude < 33.62116425145008
                """).write.format('parquet').mode('append').insertInto('gps_data_supplement')
Jia Yu
@jiayuasu
@vasquezk26 Apparently, you cast a Point geometry to string: ST_Point(CAST(gdata.longitude AS DECIMAL(16,12)), CAST(gdata.latitude AS DECIMAL(16,12))) AS STRING
vasquezk26
@vasquezk26
Error Message:
Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
Jia Yu
@jiayuasu
This for sure will lead to the exception
vasquezk26
@vasquezk26
@jiayuasu when I cast it as a decimal and run my code I get the following error:
cannot resolve 'CAST(st_point(CAST(gdata.`longitude` AS DECIMAL(16,12)), CAST(gdata.`latitude` AS DECIMAL(16,12))) AS DECIMAL(10,0))' due to data type mismatch: cannot cast array<tinyint> to decimal(10,0
What should this be?
Jia Yu
@jiayuasu
You should not cast Sedona Geometry column
it is a geometry type. Cannot be casted to others
vasquezk26
@vasquezk26
@jiayuasu what line are you referring to? If you are referring to , CAST(ST_Point(CAST(gdata.longitude AS DECIMAL(16,12)), CAST(gdata.latitude AS DECIMAL(16,12))) AS STRING) AS geometry the lat/long are double columns in one of my tables, and I need to be able to map that to a given shapefile. How do I correct this mistake? I'm not sure how to get around this.
vasquezk26
@vasquezk26
switched the following line: ST_Point(CAST(gdata.longitude AS DECIMAL(16,12)), CAST(gdata.latitude AS DECIMAL(16,12))) AS geometry but it still threw the same error
vasquezk26
@vasquezk26
Once I refreshed my session it was good go. Thanks @jiayuasu
Jia Yu
@jiayuasu
CAST(XXX as STRING) means you want to cast a column to a type
XXX as STRING or XXX AS geometry (without cast) in SQL syntax means you want to rename the column not cast
myz525
@myz525
Hey, guys. What is the website of GeoSpark
I just Googled it, not geospark as I thought it would be
Jia Yu
@jiayuasu
Hi we are no longer called GeoSpark. The name is Apache Sedona.
This Gitter channel has been deprecated
https://gitter.im/apache/sedona
The new one is here
myz525
@myz525
Thank you very much!
rimmmmm
@rimmmmm
I'm using Sedona/ Java , I would like to execute JavaPairRDD joinResultPairRDD = JoinQuery.SpatialJoinQuery(point_RDD, polygon_RDD, true, true);
Dataset<Row> joinResultDf = Adapter.toDf(joinResultPairRDD, columns, Spark_Session);
error is method Adapter.<T#1>toDf(SpatialRDD<T#1>,Seq<String>,SparkSession) is not applicable
(cannot infer type-variable(s) T#1
(argument mismatch; JavaPairRDD cannot be converted to SpatialRDD<T#1>))
method Adapter.<T#2>toDf(SpatialRDD<T#2>,SparkSession) is not applicable
(cannot infer type-variable(s) T#2
(actual and formal argument lists differ in length))
where T#1,T#2 are type-variables:
T#1 extends Geometry declared in method <T#1>toDf(SpatialRDD<T#1>,Seq<String>,SparkSession)
T#2 extends Geometry declared in method <T#2>toDf(SpatialRDD<T#2>,SparkSession)
without columns argument in Adapter.toDf
+--------------------+--------------------+
| leftgeometry| rightgeometry|
+--------------------+--------------------+
|POLYGON ((13.5830...|POINT (13.664 50.86)| which doesn't include other columns from spatial RDD
Jia Yu
@jiayuasu
@rimmmmm You just need to add one more line before the adapter: http://sedona.apache.org/tutorial/sql/#spatialpairrdd-to-dataframe
rimmmmm
@rimmmmm
public static Seq<String> left_columns_seq;
public static Seq<String> right_columns_seq; List<String> left_columns = Arrays.asList(new String[]{"a"});
List<String> right_columns = Arrays.asList(new String[]{"b", "c", "d", });
left_columns_seq = Utils.convertListToSeq(left_columns);
right_columns_seq = Utils.convertListToSeq(right_columns);
public static Seq<String> convertListToSeq(List<String> inputList) {
return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();
}
import scala.collection.JavaConverters;
import scala.collection.Seq;
@jiayuasu thanks :) I posted above java code
RudyEvers
@RudyEvers
Hi, I have an urgent question. Is it possible to convert a linestring to a geometry datatype in Databricks? In SQL you can use something like ST_GeofromText. Is there a way to do that with GeoSpark?
seyal84
@seyal84
This message was deleted
LucioMelito
@LucioMelito
Hey everyone! I have found multiple times that inner joins are much much more performant than left ones, to the point that it's better to do a spatial inner join followed by a normal left join in spark. Why is that?
Jia Yu
@jiayuasu
This Gitter channel has been deprecated
https://gitter.im/apache/sedona
The new one is here
ArunaVeluru
@ArunaVeluru
Hi, I am using geospark spatial join on skewed data set (ploygons in GB and points in KB/MB) and its taking very long time on a cluster of 4 r5a.8xlarge core nodes ..Could someone help how to handle this volume?
ArunaVeluru
@ArunaVeluru
My code is:
poly_spatial_rdd.spatialPartitioning(GridType.KDBTREE,500)
point_spatial_rdd.spatialPartitioning(poly_spatial_rdd.getPartitioner)
val buildOnSpatialPartitionedRDD = true // Set to TRUE only if run join query
val usingIndex = true
val considerBoundaryIntersection = true
                      poly_spatial_rdd.buildIndex(IndexType.QUADTREE, buildOnSpatialPartitionedRDD)
                      point_spatial_rdd.buildIndex(IndexType.RTREE, buildOnSpatialPartitionedRDD)

                      val locations_defects = JoinQuery.
                        SpatialJoinQueryFlat(point_spatial_rdd,poly_spatial_rdd, usingIndex, considerBoundaryIntersection)