Hi , Someone please help me to fix the below error ,I have another setup working which is hadoop 2 (EMR 5.x) , Now I am testing EMR 6 with new spark home that is /usr/lib/spark6/ , Just I compare with both setting everything looks good for me. Is there any specific setting I need to checlk
sc <- spark_connect(master = "yarn", spark_home = "/usr/lib/spark6", deploymode = "cluster", enableHiveSupport = TRUE)
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (32486): Gateway in localhost:8880 did not respond.
Path: /usr/lib/spark6/bin/spark-submit
Parameters: --class, sparklyr.Shell, '/opt/R/3.6.0/lib64/R/library/sparklyr/java/sparklyr-2.4-2.11.jar', 8880, 32486
Log: /tmp/RtmpijZOtA/filee69e18f188dc_spark.log
---- Output Log ----
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at sparklyr.Shell$.main(shell.scala:9)
at sparklyr.Shell.main(shell.scala)
GOOGLE_CITY_DESC
' given input columns:"compute()
forces the SQL query you have accumulated so far to be evaluated so that might help
@njesp You can try the following to print spark-submit log to console to see what's failing:
library(sparklyr)
options(sparklyr.log.console = TRUE)
sc <- spark_connect(master = "local")
The spark-submit log usually ends up in a text file, but the path to that file is highly system-dependent and also could be influenced by your local config... so rather than spending time figuring out where it might be it's just easier to have options(sparklyr.log.console = TRUE) while trouble-shooting
@yl790 Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
:: resolution report :: resolve 84419ms :: artifacts dl 0ms
:: modules in use: --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 0 | 0 | ---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: saurfang#spark-sas7bdat;1.1.5-s_2.11
==== local-m2-cache: tried
file:/C:/Users/njn/.m2/repository/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
file:/C:/Users/njn/.m2/repository/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
==== local-ivy-cache: tried
C:\Users\njn\.ivy2\local\saurfang\spark-sas7bdat\1.1.5-s_2.11\ivys\ivy.xml
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
C:\Users\njn\.ivy2\local\saurfang\spark-sas7bdat\1.1.5-s_2.11\jars\spark-sas7bdat.jar
==== central: tried
https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
==== spark-packages: tried
https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom
-- artifact saurfang#spark-sas7bdat;1.1.5-s_2.11!spark-sas7bdat.jar:
https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: saurfang#spark-sas7bdat;1.1.5-s_2.11: not found
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
Server access error at url https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://repo1.maven.org/maven2/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.pom (java.net.ConnectException: Connection timed out: connect)
Server access error at url https://dl.bintray.com/spark-packages/maven/saurfang/spark-sas7bdat/1.1.5-s_2.11/spark-sas7bdat-1.1.5-s_2.11.jar (java.net.ConnectException: Connection timed out: connect)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: saurfang#spark-sas7bdat;1.1.5-s_2.11: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1302)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSub
Hi, I tried to connect via sparklyr to our spark cluster via yarn-cluster mode. But the connection fails after 30 seconds. When looking at the logs I see the following behaviour. Everything looks quite normal until the the application starts:
20/07/24 13:21:01 INFO Client: Submitting application application_1595494790876_0069 to ResourceManager
20/07/24 13:21:01 INFO YarnClientImpl: Submitted application application_1595494790876_0069
20/07/24 13:21:02 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
20/07/24 13:21:02 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1595596856971
final status: UNDEFINED
tracking URL: http://lm:8088/proxy/application_1595494790876_0069/
user: hadoop
20/07/24 13:21:03 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
20/07/24 13:21:04 INFO Client: Application report for application_1595494790876_0069 (state: ACCEPTED)
The last message then continuously repeats. After some while I get some messages from the logs.
20/07/24 13:22:03 WARN sparklyr: Gateway (35459) Failed to get network interface of gateway server socketnull
Any idea what could go wrong? I guess a lot of things... especially since we are in a quite restrict network. It was already quite a pain to reach this point. The client only sees the workers ports 9868. I now also opened
port 8880 since I thought maybe the sparklyr gateway on the node tries to communicate with the client
and fails. But this didn't change anything.
sc <- spark_connect(
master = "http://192.168.0.6:8998",
version = "2.4.4",
method = "livy", config = livy_config(
driver_memory = "2G",
driver_cores = 2,
executor_memory = "4G",
executor_cores = 2,
num_executors = 4
))
Error in livy_validate_http_response("Failed to create livy session", : Failed to create livy session (Client error: (400) Bad Request):
{"msg":"java.net.URISyntaxException: Illegal character in scheme name at index 1:
c(\"https://github.com/sparklyr/sparklyr/blob/feature/sparklyr-1.3.0/inst/java/sparklyr-2.4-2.11.jar?raw=true\", \"https://github.com/sparklyr/sparklyr/blob/feature/sparklyr-1.3.0/inst/java/sparklyr-2.4-2.12.jar?raw=true\")"}
Traceback:
1. spark_connect(master = "http://192.168.0.6:8998", version = "2.4.4",
. method = "livy", config = livy_config(config, driver_memory = "2G",
. driver_cores = 2, executor_memory = "4G", executor_cores = 2,
. num_executors = 4))
2. livy_connection(master, config, app_name, version, hadoop_version,
. extensions, scala_version = scala_version)
3. livy_create_session(master, config)
4. livy_validate_http_response("Failed to create livy session",
. req)
5. stop(message, " (", httpStatus$message, "): ", httpContent)
I was trying to connect to spark locally using:
conf <- spark_config()
conf$`sparklyr.connect.cores.local` <- 12
conf$`sparklyr.cores.local` <- 4
conf$sparklyr.shell.deploy-mode <- "client"
conf$`sparklyr.shell.driver-memory` <- "32G"
conf$`spark.executor.cores` <- 1
conf$`spark.executor.memory` <- "2G"
conf$`sparklyr.verbose` <- TRUE
conf$`sparklyr.log.console` <- TRUE
conf$`spark.executor.instances` <- 4
conf$spark.sql.shuffle.partitions <- 5
conf$`spark.dynamicAllocation.enabled` <- FALSE
sc <- spark_connect(master = "local",
config = conf,
spark_home = Sys.getenv("SPARK_HOME"),
log = "console", version = "3.0.0")
but then, I figured that master "local" do not create executors, but only the driver one. So I tried to run on yarn-client
, however, I get the following error message:
d:\spark\bin\spark-submit2.cmd --driver-memory 32G --class sparklyr.Shell "C:\Users\B2623385\Documents\R\win-library\3.6\sparklyr\java\sparklyr-3.0-2.12.jar" 8880 23210
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
I am using a shared windows server in my work that holds 16 cores with 64Gb.
ClassNotFoundException
. Tho, I've been adding these classes via conf$livy.jars <- c( ... )
as before. What I did noticed is that on the yarn side - that these jar files are no longer listed as part of the launch script. Is there a different way to specify jar files to be loaded as part of livy config?