Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Regan McDonald
    @fifthpostulate
    Does anyone distribute a versioned R environment to sparklyr session worker nodes rather than install the same version of R on all Spark nodes?
    Ying
    @ying1
    Hello - question... w/ the new sparklyr version tag (which is now required for livy connections) - I am noticing that my classes doesn't load on remote system anymore. It would throw a ClassNotFoundException . Tho, I've been adding these classes via conf$livy.jars <- c( ... ) as before. What I did noticed is that on the yarn side - that these jar files are no longer listed as part of the launch script. Is there a different way to specify jar files to be loaded as part of livy config?
    Ying
    @ying1
    It looks like the conf$livy.jars has been updated to calculate a bunch of other jars... but the code is not working properly. :(
    Ying
    @ying1
    I set conf$sparklyr.livy.sources <- TRUE and continue to use conf$livy.jars and that seems to push the correct livy settings to the livy server. But then there is an issue with Failed to initialize livy connection: Failed to execute Livy statement with error: <console>:24: error: not found: value LivyUtils ... not sure why ?
    Jake Russ
    @JakeRuss
    I am seeking advice for how structure sparklyr calls and queries mapped over a list of dates. I posted over a https://community.rstudio.com/t/seeking-better-practice-for-sparklyr-purrr-map-to-iterate-query-over-a-list/85171 I wanted to draw attention to it here in case any of you sparklyr experts could comment there. Many thanks.
    ®γσ, ξηg Lιαη Ημ
    @englianhu

    sc <- spark_connect(master = 'local')
    Error in start_shell(master = master, spark_home = spark_home, spark_version = version, : Failed to find 'spark-submit2.cmd' under 'C:\Users\Owner\AppData\Local\spark\spark-3.0.0-bin-hadoop2.7', please verify SPARK_HOME.

    I faced an issue and raised via sparklyr/sparklyr#2769

    ®γσ, ξηg Lιαη Ημ
    @englianhu

    sc <- spark_connect(master = 'local')
    Error in start_shell(master = master, spark_home = spark_home, spark_version = version, : Failed to find 'spark-submit2.cmd' under 'C:\Users\Owner\AppData\Local\spark\spark-3.0.0-bin-hadoop2.7', please verify SPARK_HOME.

    I faced an issue and raised via sparklyr/sparklyr#2769

    Solved !!!
    Step :
    1) https://spark.apache.org/downloads.html
    2) extract zipped file to 'C:/Users/scibr/AppData/Local/spark/spark-3.0.1-bin-hadoop3.2'.
    3) manually choose latest version : spark_home_set('C:/Users/scibr/AppData/Local/spark/spark-3.0.1-bin-hadoop3.2')

    Regan McDonald
    @fifthpostulate
    Has anyone run in to "java.lang.SecurityException: class "io.netty.buffer.ArrowBuf"'s signer information does not match signer information of other classes in the same package" when trying to use arrow with sparklyr?
    ajp97
    @ajp97

    Hi everyone. Spark newbie here. Got the following error in class and haven't been able to solve it:
    Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
    Gateway in localhost:8880 did not respond.

    Try running options(sparklyr.log.console = TRUE) followed by sc <- spark_connect(...) for more debugging info..

    I saw @javierluraschi answer here sparklyr/sparklyr#801, but this fix hasn't proven effective for me. Any kind of help would be deeply appreciated.

    Thanks a lot!

    lidyaann1
    @lidyaann1
    Hi , I am trying to connect to standalone EMR cluster from R studio using sparklyr and livy but keep getting Error in livy_connection(master, config, app_name, version, hadoop_version, :
    Failed to launch livy session, session status is shutting_down
    Maher Daoud
    @maherdaoud
    Guys, I hope all of you are doing well, I stuck with the following error when I try to run a basic GraphFrames example, Error:java.lang.ClassNotFoundException: org.graphframes.GraphFrame
    Gisela
    @giselamorrone

    Hi Everyone!, Im trying to migrate a script to sparklyR, And I cant find the equivalent to spread and gather. My code looks something like:

    ltv_curves %>%
          spread(
            key = !!as.name(column_to_fill),
            value = grosstotal
          ) %>%
          gather(
            key = !!as.name(column_to_fill),
            value = grosstotal,
            -ignore_columns
          )

    Anyone around that can help with this?

    Zachary Barry
    @ZackBarry

    I'm getting an error using spark_apply when connecting to a Kubernetes cluster. I can run sdf_len(sc, 10) just fine but running sdf_len(sc, 10) %>% spark_apply(function(df) I(df)) returns the following error:

    Error: java.io.FileNotFoundException: File file:/var/folders/jf/lqnngxkj0x75cdmv_xjygfq40000gq/T/RtmpuaCX4s/packages/packages.8599.tar does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1534)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sparklyr.Invoke.invoke(invoke.scala:147)
        at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
        at sparklyr.StreamHandler.read(stream.scala:61)
        at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:58)
        at scala.util.control.Breaks.breakable(Breaks.scala:42)
        at sparklyr.BackendHandler.channelRead0(handler.scala:39)
        at sparklyr.BackendHandler.channelRead0(handler.scala:14)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

    Spark 3.0.1, Scala 2.12, sparklyr 1.5.2

    1 reply
    Noman Bukhari
    @nmnbkhr
    hi i am running a code in R when i run spark_apply i get exception..
    need help pn that
    rink1135
    @rink1135
    Hello, I am trying to connect to spark and am getting an error when connecting to the port. Not sure what is happening. I tried many things to try to get it working.

    sc <- spark_connect(master = "local", version = "2.3")#connect to this local cluster
    Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
    Gateway in localhost:8880 did not respond.

    Try running options(sparklyr.log.console = TRUE) followed by sc <- spark_connect(...) for more debugging info.

    1 reply
    this is the error
    Yitao Li
    @yitao-li
    Dear contributors of sparklyr,
    I will present sparklyr during the LFAI annual project review on Aug 26th, and would like to take this opportunity to acknowledge all individuals and organizations who have contributed to sparklyr in the past. Can you please send me the official name and logo (in Scalable Vector Graphics format, if possible) of your organization at your earliest convenience? My email is yitao@rstudio.com .
    Thanks in advance!
    Yitao Li
    ZGOLLI
    @zgollli1
    Hi everyone, I'm getting this error when trying to connect to spark:
    Hi everyone, I'm getting this error when trying to connect to spark: Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
    Gateway in localhost:8880 did not respond.
    Yitao Li
    @yitao-li
    @zgollli1 Did you run spark_install() to download some version of Spark to ~/spark? Feel free to file an issue at https://github.com/sparklyr/sparklyr/issues with more details.
    In the past many people have asked me about this type of error and it usually gets resolved fairly quickly (e.g., loopback interface is not up, some type of Spark home / Spark version mismatch, Apache Spark not installed correctly, etc)
    Michael Mateju
    @mysakbm
    Hi, I know that this room is dedicated to rstudio, but can i have question to sparklyr error in R plugin for Pycharm?
    abrkic0
    @abrkic0

    Hi, I'm trying to use spark_read_avro but I'm always getting the same error:

    Error in validate_spark_avro_pkg_version(sc) : 
      Avro support must be enabled with `spark_connect(..., version = <version>, packages = c("avro", <other package(s)>), ...)`  or by explicitly including 'org.apache.spark:spark-avro_2.12:3.1.1-SNAPSHOT' for Spark version 3.1.1-SNAPSHOT in list of packages

    I specified my spark version and added avro to packages in the spar_connect function, tried both examples I've got in the error message, but none are working.
    Does anybody know how to fix this?

    Laurent Berder
    @LaurentBerder
    Hi guys. Anyone knows how to apply a function on a sorted groupby object?
    https://stackoverflow.com/questions/69664210/apply-function-after-groupby-in-sparklyr
    Dave Lee
    @davecalee
    Hi, I'm trying to get the nearest neighbour to each point in sparklyr using ml_approx_nearest_neighbors, but using lapply() to loop through each point to find its nearest neighbour is proving very inefficient. Is there another way to achieve this is spark, pls?
    JB-data
    @JB-data
    Question about spark_read_jdbc connector to connect to oracle: is it possible that memory=FALSE is ever ignored?
    Were reading 20 tables with this connector, and I see in the spark UI they are all cached, and I see many counts triggered..
    I think the default is to have memory=TRUE so I feel like somehow what I have put is ignored , and he takes the default.