Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Sanjeev Singh
    but can u please solve my issue...
    as soon as i able to read data from Cassandra table i lost my connection from there
    hello... I'm using spark 2.2.0, spark-cassandra-connector 2.0.7 and cassandra 3.*. on rDD.joinWithCassandraTable I'm continuously getting busypoolexception and the job is not progressing further. Can someone please help me with this.?
    I'm new to spark and cassandra and clueless. any help or direction is highly appreciated.
    Brian Cantoni
    @pkkarri we have moved the Spark Cassandra Connector discussions over to Slack: https://academy.datastax.com/slack, once you've signed up, please check out the #spark-connector and #dse-analytics rooms
    Hari Prasanth
    Hi Guys, Good Day, This is Hari and I have quick question.
    I understand 'commit log' is the combination of multiple memtables so when will the commit logs be removed?
    Brian Cantoni
    @HariLogana_twitter we have moved the Spark Cassandra Connector discussions over to Slack: https://academy.datastax.com/slack, once you've signed up, please check out the #spark-connector and #dse-analytics rooms
    Winnie Wu
    hi, I've met an issue. I downloaded b2.0 spark-cassandra-connector, and then try to compile
    but I got this error:

    D:\spark_learning\connectors\cassandra\spark-cassandra-connector>sbt -Dscala-2.11=true
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
    [info] Loading project definition from D:\spark_learning\connectors\cassandra\spark-cassandra-connector\project\project
    [info] Loading project definition from D:\spark_learning\connectors\cassandra\spark-cassandra-connector\project
    Using releases: https://oss.sonatype.org/service/local/staging/deploy/maven2 for releases
    Using snapshots: https://oss.sonatype.org/content/repositories/snapshots for snapshots

    Scala: 2.11.8
    Scala Binary: 2.11
    Java: target=1.7 user=1.8.0_144
    Cassandra version for testing: 3.6 [can be overridden by specifying '-Dtest.cassandra.version=<version>']

    [info] Set current project to root (in build file:/D:/spark_learning/connectors/cassandra/spark-cassandra-connector/)
    [error] Not a valid command: true
    [error] Not a valid key: true (similar: run, runner, target)
    [error] true
    [error] ^

    Winnie Wu
    I've signed up in https://academy.datastax.com/slack, but I don't know where to post my question
    Bhumil Sarvaiya
    Can somebody please tell me whether if I do nodetool snapshot, the snapshots of all tables for each keyspace will be taken at exact same time?
    Brian Cantoni
    @bhumilsarvaiya we have moved the Spark Cassandra Connector discussions over to Slack: https://academy.datastax.com/slack, once you've signed up and launched the Slack app, please check out the #general which will be a good place for your question
    Bhumil Sarvaiya
    @bcantoni okay. Thank you for the info
    Hello, I encounter a Spark bug that makes me unable to write to cassandra, partly because of the connector input filters.
    The bug raises an exception during the predicatePushDown function of the CassandraSourceRelation class.
    Is there any way to disable these filters (even if it's unsafe) ?
    How to achieve best performance for writing huge number of records (for example 2000000) in Cassandra ? I am using Scala, Datastax driver and phantom in my project. How can I insert these many records in database in a performant way?

    Hi guys! quick question on spark structured streaming and option values in scala (DSE v6.0.4). I have a table where one column, say name can be optional,
    so I use ScalaOption[String] to model it (as I understand it helps to avoid tombstone problem when set to Unset):

        //case class corresponding to a table 
        case class MyCaseClass(id: String, name: ScalaOption[String])
        //dataset of case classes
        val dataset: Dataset[MyCaseClass] = ???
          .cassandraFormat(table, keyspace)

    at runtime I get this exception:

    <none> is not a term
    scala.ScalaReflectionException: <none> is not a term
        at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:199)

    Does anyone know how to fix it?
    Also, if I just use scala Option, will that cause tombstone problem (mark for deletion instead of unset)?

    Luis Angel Vicente Sanchez
    Hi! Is there anyway I can avoid something like this on my build.sbt?
      assemblyShadeRules in assembly := Seq(
        //need to shade these dependencies because they are embedded in the cassandra connector
          .rename("com.kenai.**" -> "shade.com.kenai.@1",
                  "jnr.**" -> "shade.jnr.@1",
                  "org.objectweb.asm.**" -> "shade.org.objectweb.asm.@1")
          .inLibrary("com.github.jnr" % "jffi"           % "1.2.15",
                     "com.github.jnr" % "jnr-constants"  % "0.9.8",
                     "com.github.jnr" % "jnr-enxio"      % "0.16",
                     "com.github.jnr" % "jnr-ffi"        % "2.1.4",
                     "com.github.jnr" % "jnr-posix"      % "3.0.35",
                     "com.github.jnr" % "jnr-unixsocket" % "0.18",
                     "com.github.jnr" % "jnr-x86asm"     % "1.0.2",
                     "org.ow2.asm"    % "asm"            % "5.0.3",
                     "org.ow2.asm"    % "asm-analysis"   % "5.0.3",
                     "org.ow2.asm"    % "asm-commons"    % "5.0.3",
                     "org.ow2.asm"    % "asm-tree"       % "5.0.3",
                     "org.ow2.asm"    % "asm-util"       % "5.0.3")
    the statsd client I'm using uses jnr and asm as dependencies and I can't be sure if the version of classes embedded in the connector are the right ones

    Hi all
    I'm trying to make SCC work with scala 2.12
    When I run it:test I get the following error:

    [trace] Stack trace suppressed: run last cassandra-server/:coursierResolutions for the full output.
    [error] (cassandra-server/
    :coursierResolutions) coursier.ResolutionException: Encountered 1 error(s) in dependency resolution:
    [error] org.apache.spark:spark-core_2.12:1.4.0:
    [error] not found:
    [error] /var/root/.ivy2/local/org.apache.spark/spark-core_2.12/1.4.0/ivys/ivy.xml
    [error] https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/1.4.0/spark-core_2.12-1.4.0.pom
    [error] https://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-core_2.12/1.4.0/spark-core_2.12-1.4.0.pom
    [error] /var/root/.m2/repository/org/apache/spark/spark-core_2.12/1.4.0/spark-core_2.12-1.4.0.pom
    [error] https://repository.apache.org/content/groups/staging/org/apache/spark/spark-core_2.12/1.4.0/spark-core_2.12-1.4.0.pom
    [error] https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.12/1.4.0/spark-core_2.12-1.4.0.pom

    Can someone explain what is the reason? Where does it use spark 1.4.0?

    Hi I am trying to use spark-cassandra connector with Java.
    I am trying to covert my scala code to java code. when i covert i get ..SparkSession does not have .setCassandraConf... what should i do ? Any specific dependency i need to add in my pom file?
    Brian Cantoni
    Periodic reminder that we have moved the Spark Cassandra Connector discussions over to Slack: https://academy.datastax.com/slack, once you've signed up, please check out the #spark-connector and #dse-analytics rooms.
    thanks a lot
    what is this one ?
    Brian Cantoni
    @shatestest this Gitter page was used for a short time, but then the team decided to focus all conversations over in Slack.
    @bcantoni thanks Brain... but Slack is not easy to use as Gitter... it asks a lot of stuff...
    Eugene Golovan
    hi guys, on this page https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/spark/sparkPredicatePushdown.html what this means "The asterisk indicates that push down filter will be handled only at the datasource level." ?
    why I am asking. I have indexed column created_at (not partitioned or clustering), it's date and when I use created_at=... I get asterisk and query is very quick. For created_at>= I still have pushdown in explain but w/o asterisk and query is slow.
    Why it's so? And what "will be handled only at the datasource level" means? On what level it can be if not on datasource?
    Hello, did anyone faced this issue before ?

    Hi, Does anyone here know how to read all records in a Cassandra table ( I believe this is called a Full Table Scan )?
    E.g. My table schema looks like this:
    User Settings ( Table )

    1. userid ( UUID ) - partition key
    2. date ( Text ) - clustering key ( e.g. 2019-07-24 )

    There could be multiple records per user, but I need to basically get all the records and process them based on some condition(s)

    I've read about table scans using paging via tokenid but also want to know if/how a full table scan works with spark? Since this is an expensive operation, I'm trying to find the proper way of doing this in Spark.


    Hello. I was wondering if there were any updates on datastax/spark-cassandra-connector#1216 or anyway I could help on that
    Thats the PR for 2.12 support
    Giannis Polyzos
    hi everyone, i recently got started with the spark-cassandra-connector and i would like some help
    i see here that there are metrics exposed by the connector, but i can't figure out where those metrics are exposed so i can aggregate them. Can someone help me out?
    Thank you for the info..
    Kidane Weldemariam
    What is the easiest way to write to cassandra uuid type from java.util.UUID I tried to crate a def stringToUUIDUdf: UserDefinedFunction = udf((uuid: String) => UUID.fromString(uuid)) and use it in .withColumn("orgId", stringToUUIDUdf1($"orgId")), but I keep getting java.lang.UnsupportedOperationException: Schema for type java.util.UUID is not supported
    Any help is appriciated

    Hi everyone, I am trying to collect the count from 130 GB, but I am getting this error:
    20/01/16 00:23:44 INFO CassandraConnector: Connected to Cassandra cluster: Lookup Cluster
    ------------some more logs----------------
    20/01/16 00:23:54 INFO CassandraConnector: Disconnected from Cassandra cluster: Lookup Cluster
    When I am trying to collect the count from 10GB table or 25GB table, I am able to collect the count but for 130GB connection is disconnected.
    scala -version: 2.12.10
    java -version: openjdk version "1.8.0_232"
    cassandra -version: 3.0.8
    Spark: 2.4.2
    Driver: spark-cassandra-connector_2.11
    I am using these properties:
    val spark = SparkSession
    .appName("Cassandra Spark App")
    .config("spark.cassandra.connection.host", "ip")
    .config("spark.cassandra.auth.username", "user")
    .config("spark.cassandra.auth.password", "password")
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .config("spark.executor.memory", "48G")
    .config("spark.cassandra.input.split.size_in_mb", "1")
    .config("spark.eventLog.enabled", "true")
    .config("spark.cassandra.connection.timeout_ms", 15000)
    .config("spark.master", "spark://ip:7077")

    Can anyone please suggest a way to come out? I am struggling to understand why on smaller amount of data, spark can able to collect the count but for larger amount (130GB) of data, spark can't able to collect the count?

    @RussellSpitzer any suggestions on this above requested question in which 130gb data count not able to collect due to disconnection from cluster but upto 10gb of data count can be able to collect?
    Asanka Madushan Perera

    hi everyone,
    I'm trying to connect Cassandra 3 nodes cluster with the spark. Right now i have one master spark server and 3 workers installed in each Cassandra node.
    Cassandra 3.11.6
    Spark 2.4.5

    When i try to load a table it gives an error saying :
    java.io.IOException: Failed to open native connection to Cassandra at {,,}:9042
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:168)

    Any ideas?

    Ravi Madireddy
    @here : we run cassandra in kuberenetes, and use spark-cassandra connector , the problem we face is for kubernetes POD restarts ,cassandra IP changes
    in worker logs , we see old ip as DOWN and we can see New replaced IP as newly added node, and connections are being made to new pod with correct POD_IP , but the problem is we also see connections still going to old pod_ip
    and try is still happening to old dead pod_ip
    com.datastax.driver.core.Host.STATES: Defuncting Connection[/, inFlight=0, closed=false] because: [/] Cannot connect
    com.datastax.driver.core.exceptions.TransportException: [/] Cannot connect
    any spark cassandra connector retry attempt i can set to limit no.of retry attempts to a dead node
    this warning i see only in spark-worker log
    any reconnection policy similar to the one in java driver we can configure in spark-cassandra-connector ?
    Russell Spitzer
    https://community.datastax.com/index.html - Hey everyone, please ask your questions here now << It's a QA site for all C* related questions
    Giannis Polyzos

    @RussellSpitzer I used spark cassandra connector version 2.4.3 where i noticed i got this
    INFO CassandraConnector: Disconnected from Cassandra cluster:
    Upgrading to version 2.5.0 seems to fix this issue, but even though everything runs ok locally when i upload the jar on my databricks cluster with 2.5.0 version i get the following exception

    Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig

    Can you please help out? i need to fix this asap

    Russell Spitzer
    Please ask on the mailing list or the QA site above
    @polyzos you aren't including all the dependencies, with Databricks add the the connector using the "Maven artifact" option, don't just include a single jar
    7 replies
    Sha BdEngineer
    is this group still active ?