@njesp You can try the following to print spark-submit log to console to see what's failing:
library(sparklyr)
options(sparklyr.log.console = TRUE)
sc <- spark_connect(master = "local")
The spark-submit log usually ends up in a text file, but the path to that file is highly system-dependent and also could be influenced by your local config... so rather than spending time figuring out where it might be it's just easier to have options(sparklyr.log.console = TRUE) while trouble-shooting
@yl790 Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
:: resolution report :: resolve 84419ms :: artifacts dl 0ms
:: modules in use: --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 0 | 0 | ---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: saurfang#spark-sas7bdat;1.1.5-s_2.11
==== local-m2-cache: tried
file:/C:/Users/njn/.m2/repository/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
file:/C:/Users/njn/.m2/repository/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
==== local-ivy-cache: tried
C:\Users\njn\.ivy2\local\saurfang\spark-sas7bdat\1.1.5-s_2.11\ivys\ivy.xml
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
C:\Users\njn\.ivy2\local\saurfang\spark-sas7bdat\1.1.5-s_2.11\jars\spark-sas7bdat.jar
==== central: tried
https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
==== spark-packages: tried
https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: saurfang#spark-sas7bdat;1.1.5-s_2.11: not found
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
Server access error at url https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar (java.net.ConnectException: Connection timed out: connect)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: saurfang#spark-sas7bdat;1.1.5-s_2.11: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1302)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSub
Hi, I tried to connect via sparklyr to our spark cluster via yarn-cluster mode. But the connection fails after 30 seconds. When looking at the logs I see the following behaviour. Everything looks quite normal until the the application starts:
20/07/24 13:21:01 INFO Client: Submitting application application_1595494790876_0069 to ResourceManager
20/07/24 13:21:01 INFO YarnClientImpl: Submitted application application_1595494790876_0069
20/07/24 13:21:02 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
20/07/24 13:21:02 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1595596856971
final status: UNDEFINED
tracking URL: http://lm:8088/proxy/application_1595494790876_0069/
user: hadoop
20/07/24 13:21:03 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
20/07/24 13:21:04 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
The last message then continuously repeats. After some while I get some messages from the logs.
20/07/24 13:22:03 WARN sparklyr: Gateway (35459) Failed to get network interface of gateway server socketnull
Any idea what could go wrong? I guess a lot of things... especially since we are in a quite restrict network. It was already quite a pain to reach this point. The client only sees the workers ports 9868. I now also opened
port 8880 since I thought maybe the sparklyr gateway on the node tries to communicate with the client
and fails. But this didn't change anything.
sc <- spark_connect(
master = "http://192.168.0.6:8998",
version = "2.4.4",
method = "livy", config = livy_config(
driver_memory = "2G",
driver_cores = 2,
executor_memory = "4G",
executor_cores = 2,
num_executors = 4
))
Error in livy_validate_http_response("Failed to create livy session", : Failed to create livy session (Client error: (400) Bad Request):
{"msg":"java.net.URISyntaxException: Illegal character in scheme name at index 1:
c(\"https://github.com/sparklyr/sparklyr/blob/feature/sparklyr-1.3.0/inst/java/sparklyr-2.4-2.11.jar?raw=true\", \"https://github.com/sparklyr/sparklyr/blob/feature/sparklyr-1.3.0/inst/java/sparklyr-2.4-2.12.jar?raw=true\")"}
Traceback:
1. spark_connect(master = "http://192.168.0.6:8998", version = "2.4.4",
. method = "livy", config = livy_config(config, driver_memory = "2G",
. driver_cores = 2, executor_memory = "4G", executor_cores = 2,
. num_executors = 4))
2. livy_connection(master, config, app_name, version, hadoop_version,
. extensions, scala_version = scala_version)
3. livy_create_session(master, config)
4. livy_validate_http_response("Failed to create livy session",
. req)
5. stop(message, " (", httpStatus$message, "): ", httpContent)
I was trying to connect to spark locally using:
conf <- spark_config()
conf$`sparklyr.connect.cores.local` <- 12
conf$`sparklyr.cores.local` <- 4
conf$sparklyr.shell.deploy-mode <- "client"
conf$`sparklyr.shell.driver-memory` <- "32G"
conf$`spark.executor.cores` <- 1
conf$`spark.executor.memory` <- "2G"
conf$`sparklyr.verbose` <- TRUE
conf$`sparklyr.log.console` <- TRUE
conf$`spark.executor.instances` <- 4
conf$spark.sql.shuffle.partitions <- 5
conf$`spark.dynamicAllocation.enabled` <- FALSE
sc <- spark_connect(master = "local",
config = conf,
spark_home = Sys.getenv("SPARK_HOME"),
log = "console", version = "3.0.0")
but then, I figured that master "local" do not create executors, but only the driver one. So I tried to run on yarn-client
, however, I get the following error message:
d:\spark\bin\spark-submit2.cmd --driver-memory 32G --class sparklyr.Shell "C:\Users\B2623385\Documents\R\win-library\3.6\sparklyr\java\sparklyr-3.0-2.12.jar" 8880 23210
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
I am using a shared windows server in my work that holds 16 cores with 64Gb.
ClassNotFoundException
. Tho, I've been adding these classes via conf$livy.jars <- c( ... )
as before. What I did noticed is that on the yarn side - that these jar files are no longer listed as part of the launch script. Is there a different way to specify jar files to be loaded as part of livy config?
conf$sparklyr.livy.sources <- TRUE
and continue to use conf$livy.jars
and that seems to push the correct livy settings to the livy server. But then there is an issue with Failed to initialize livy connection: Failed to execute Livy statement with error: <console>:24: error: not found: value LivyUtils
... not sure why ?
sc <- spark_connect(master = 'local')
Error in start_shell(master = master, spark_home = spark_home, spark_version = version, :
Failed to find 'spark-submit2.cmd' under 'C:\Users\Owner\AppData\Local\spark\spark-3.0.0-bin-hadoop2.7', please verify SPARK_HOME.
I faced an issue and raised via sparklyr/sparklyr#2769
sc <- spark_connect(master = 'local')
Error in start_shell(master = master, spark_home = spark_home, spark_version = version, : Failed to find 'spark-submit2.cmd' under 'C:\Users\Owner\AppData\Local\spark\spark-3.0.0-bin-hadoop2.7', please verify SPARK_HOME.
I faced an issue and raised via sparklyr/sparklyr#2769
Solved !!!
Step :
1) https://spark.apache.org/downloads.html
2) extract zipped file to 'C:/Users/scibr/AppData/Local/spark/spark-3.0.1-bin-hadoop3.2'.
3) manually choose latest version : spark_home_set('C:/Users/scibr/AppData/Local/spark/spark-3.0.1-bin-hadoop3.2')
Hi everyone. Spark newbie here. Got the following error in class and haven't been able to solve it:
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
Try running options(sparklyr.log.console = TRUE)
followed by sc <- spark_connect(...)
for more debugging info..
I saw @javierluraschi answer here sparklyr/sparklyr#801, but this fix hasn't proven effective for me. Any kind of help would be deeply appreciated.
Thanks a lot!
Hi Everyone!, Im trying to migrate a script to sparklyR, And I cant find the equivalent to spread and gather. My code looks something like:
ltv_curves %>%
spread(
key = !!as.name(column_to_fill),
value = grosstotal
) %>%
gather(
key = !!as.name(column_to_fill),
value = grosstotal,
-ignore_columns
)
Anyone around that can help with this?
I'm getting an error using spark_apply
when connecting to a Kubernetes cluster. I can run sdf_len(sc, 10)
just fine but running sdf_len(sc, 10) %>% spark_apply(function(df) I(df))
returns the following error:
Error: java.io.FileNotFoundException: File file:/var/folders/jf/lqnngxkj0x75cdmv_xjygfq40000gq/T/RtmpuaCX4s/packages/packages.8599.tar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1534)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke.invoke(invoke.scala:147)
at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
at sparklyr.StreamHandler.read(stream.scala:61)
at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:58)
at scala.util.control.Breaks.breakable(Breaks.scala:42)
at sparklyr.BackendHandler.channelRead0(handler.scala:39)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Spark 3.0.1, Scala 2.12, sparklyr 1.5.2
sc <- spark_connect(master = "local", version = "2.3")#connect to this local cluster
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
Try running options(sparklyr.log.console = TRUE)
followed by sc <- spark_connect(...)
for more debugging info.
spark_install()
to download some version of Spark to ~/spark
? Feel free to file an issue at https://github.com/sparklyr/sparklyr/issues with more details.Hi, I'm trying to use spark_read_avro
but I'm always getting the same error:
Error in validate_spark_avro_pkg_version(sc) :
Avro support must be enabled with `spark_connect(..., version = <version>, packages = c("avro", <other package(s)>), ...)` or by explicitly including 'org.apache.spark:spark-avro_2.12:3.1.1-SNAPSHOT' for Spark version 3.1.1-SNAPSHOT in list of packages
I specified my spark version and added avro to packages in the spar_connect function, tried both examples I've got in the error message, but none are working.
Does anybody know how to fix this?