by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Jozef
    @jozefhajnala
    Hello, is there any chance that someone could share the planned date of CRAN release for sparklyr 1.2.0 ? Thank you.
    Yitao Li
    @yl790
    @jozefhajnala Hey we plan to release it before 20th of April
    Jozef
    @jozefhajnala
    Fantastic, thank you for the info!
    Jeff P
    @HikaGenji
    @yl790 i figured from_avro would not work with Confluent Schema registry. That is because Confluent is adding a schema id to the payload so even though you give the schema to from_avro it cant decode it. I am writing a sparklyr extension package with a scala udf to implement the confluent avro deserializer and leverage the schema registry.
    Yitao Li
    @yl790
    @HikaGenji wow good to know : )
    Yitao Li
    @yl790
    @jozefhajnala Hey just saw your pull request (sparklyr/sparklyr#2250) and realized the skip_on_arrow() function is not working as expected. I'm trying to fix that issue now. So, let's wait for all checks on sparklyr/sparklyr#2426 to complete, merge it, and then try rebase and merge your change soon.
    Yitao Li
    @yl790
    @jozefhajnala alright sparklyr/sparklyr#2426 is merged now
    Jeff P
    @HikaGenji
    @yl790 just to close the topic, i have used the following library to integrate sparklyr with Confluent Schema registry: https://github.com/AbsaOSS/ABRiS . I have written a sparklyr extension package to read avro data encoded with Confluent Schema registry: https://github.com/HikaGenji/sparklyr.confluent.avro
    Javier Luraschi
    @javierluraschi
    That’s awesome @HikaGenji! If you are interested, you could transfer the repo to https://github.com/r-spark and we will make you an admin. We founnd out that putting all Sparka nd R extensions in the same place helps users find them and helps us track extensions we should be helping with. Otherwise, we’ll just clone the extension in this repo to at least show a fork to your repo.
    Jeff P
    @HikaGenji
    sure, i would be happy to do so. Let me know how to proceed. My code is far from polished/complete but it took me so much time to get there i think it may help someone else facing the same issue.
    Javier Luraschi
    @javierluraschi
    You would have to go to your github repo URL, then settings, trannsfer ownership, confirm the repo name and enter r-spark as the new owner. We can then accept on our side and make you the owner of the transferred repo.
    Jeff P
    @HikaGenji
    got this message when doing it 'You don’t have the permission to create public repositories on r-spark'
    Javier Luraschi
    @javierluraschi
    I sent you an invite to r-spark we might have to first accept that one and I might need to give you permissions first
    Jeff P
    @HikaGenji
    accepted!
    Javier Luraschi
    @javierluraschi
    Awesome! Can you retry the transfer?
    It worked! I also made you an admin of the repo so you can do anything with it.
    Jeff P
    @HikaGenji
    fantastic, very happy to be joining the ride
    Javier Luraschi
    @javierluraschi
    Great to have you! If anything, it should help prevent someone from writing a similar extension and just reuse yours and hopefully also help
    Jeff P
    @HikaGenji
    that is cool, was a tricky one, confluent is kind of in concurrency with spark with its ksql and confluent avro is not plain avro so there is a gap. thankfully a bridge existed (abris) so i could leverage it to bring it within the sparklyr galaxy.
    Dave Kincaid
    @dkincaid

    So I'm back working on a few things in the sparknlp package. One thing I've not been able to get working has me stumped. There is a method called annotateJava() on the LightPipeline object. This method is supposed to return Map[String, List[String]]. Where the List[String] is a java.util.List. But when I get the object back into R it's looking like this:

    $document
    <jobj[44]>
      scala.collection.convert.Wrappers$SeqWrapper
      [French author who helped pioneer the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.]

    it looks like the List objects are somehow coming back into R as scala.collection.convert.Wrappers$SeqWrapper and I haven't found any way to turn them into a list or array in R. Any ideas?

    Javier Luraschi
    @javierluraschi
    There are usually two ways to fix similar issues...
    1) Is to send us a PR to convert SeqWrappers to the proper R type, see https://github.com/sparklyr/sparklyr/blob/master/java/spark-1.5.2/serializer.scala#L238-L350
    2) Is to manually map the return value with a map(e => <someconversion) in scala
    It would help us to help you fix this one if you could open a GitHub issue in the sparklyr repo with a steps to reproduce this issue locally. Thanks Dave!
    Dave Kincaid
    @dkincaid
    I've been trying #2 all day without any luck. I can't find any conversion that works. I'll take a look at the serializer and see if I can get that to work and possibly send a PR.
    I'll open an issue then. I thought maybe there was some simple solution I was overlooking. Thanks.
    Javier Luraschi
    @javierluraschi
    Thanks for the GitHub issue, should be enough to reproduce on our end and send that PR as well. Let us know how far you get and we will take it from there :)
    Dave Kincaid
    @dkincaid
    I'm not sure I can be much help. I don't really know Scala at all and it looks like that would be needed. I did create the issue #2441. I'll take a look this weekend and see if I can understand what's happening in there, but I'm not optimistic that I'll be able to figure it out.
    Javier Luraschi
    @javierluraschi
    No worries, let us take a look. Would you nag us early next week if you don’t see progress?
    I’ll try to take a look at it tomorrow or ask Yitao for help
    Javier Luraschi
    @javierluraschi
    AH, yes, a fix was needed in the serializer. Here is the PR @dkincaid sparklyr/sparklyr#2442 — We will merge withing the next few hours after tests finish running. Excited to see how you get all this working in sparknlp !!!
    Dave Kincaid
    @dkincaid
    Wow! That's great! Thank you so much. I'll check it out this weekend
    Javier Luraschi
    @javierluraschi
    NP! Merged now. Try remotes::install_github("sparklyr/sparklyr")
    Jeff P
    @HikaGenji
    hi, are sliding windows supported in sparklyr ? looking at this, seems they are not sparklyr/sparklyr#2231
    2 replies
    Rob Linger
    @th3walkingdud3
    Does anyone have any documentation or a set of repos for dealing with json objects from kafka? I can read from and write to kafka using the example code, but I have not had much luck with any type of processing.
    Jeff P
    @HikaGenji
    @th3walkingdud3 you can use the spark SQL function from_json function to parse the json string into columns. You need to provide it with the json schema, you can find examples here https://stackoverflow.com/questions/50373104/spark-sql-from-json-documentation
    13 replies
    Jeff P
    @HikaGenji
    For how to use SQL in sparklyr, I personally like this resource: https://sparkfromr.com/constructing-sql-and-executing-it-with-spark.html
    Rob Linger
    @th3walkingdud3
    I will check it out, thanks for the quick reply @HikaGenji
    Rob Linger
    @th3walkingdud3
    Screen Shot 2020-05-05 at 9.23.06 AM.png
    Rob Linger
    @th3walkingdud3
    I have not had any luck with deserializing data streaming from kafka using spark_read_kafka. Does anyone have any resources of code examples for this use case? The provided example on the sparklyr site reads and immediately writes back to kafka, which works fine, but I am needing some example code of processing incoming data prior to writing it back out to kafka.
    Jeff P
    @HikaGenji
    Essentially you can treat the result of stream_read_kafka just like a dataframe with dplyr verbs
    Kumar G
    @abdkumar

    @JakeRuss I'm trying to connect to remote cassandra using host, port, user name and password
    conf <- spark_config()
    conf[["spark.cassandra.connection.ssl.enabled"]] = TRUE
    conf[["spark.cassandra.connection.host"]] = cassandra_host
    conf[["spark.cassandra.connection.port"]] = cassandra_port
    conf[["spark.cassandra.auth.username"]] = cassandra_username
    conf[["spark.cassandra.auth.password"]] = cassandra_password
    config[["sparklyr.defaultPackages"]] <- c("org.apache.hadoop:hadoop-aws:2.7.3", "datastax:spark-cassandra-connector:2.0.0-RC1-s_2.11")

    sc <- spark_connect(master = "local", version = "2.2.0", spark_home = spark_path, config = conf)

    df <- spark_read_source(
    sc,
    name = "emp",
    source = "org.apache.spark.sql.cassandra",
    options = list(keyspace = "temp", table = "category_distribution"),
    memory = FALSE)

    but this is not working. please suggest a solution

    tarun9450
    @tarun9450
    Hi All, can you please help me with this error- "Failed while connecting to sparklyr to port (8880) for sessionid (26038): Gateway in localhost:8880 did not respond."
    tarun9450
    @tarun9450
    library(sparklyr)
    sc <- spark_connect(master = "local", spark_version = "2.4.5")

    Error in force(code) :
    Failed while connecting to sparklyr to port (8880) for sessionid (52016): Gateway in localhost:8880 did not respond.
    Path: C:\Users\Tarun_Gupta2\AppData\Local\spark\spark-2.4.5-bin-hadoop2.7\bin\spark-submit2.cmd
    Parameters: --class, sparklyr.Shell, "C:\Users\TarunGupta2\Documents\R\win-library\3.6\sparklyr\java\sparklyr-2.4-2.11.jar", 8880, 52016
    Log: C:\Users\TARUN
    ~1\AppData\Local\Temp\Rtmpw9ZV82\filea70487da97_spark.log

    ---- Output Log ----
    /Java/jdk1.8.0_251\bin\java was unexpected at this time.

    ---- Error Log ----

    Yitao Li
    @yl790
    @tarun9450 Did you have space in your JAVA_HOME environment variable?
    Jeff P
    @HikaGenji
    Hi, is anybody available to discuss sparklyr/sparklyr#2534 ? A bit difficult to express in words and requires Kafka to replicate. This is blockinge from going further so I am curious to see if any of you has insights