Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 03:01
    noorul commented #1196
  • 03:01

    noorul on master

    feat(webapi): check sjs actors … (compare)

  • 03:01
    noorul closed #1196
  • Nov 11 12:30
    codecov-io commented #1196
  • Nov 11 12:00
    SrivigneshM reopened #1196
  • Nov 11 11:59
    SrivigneshM closed #1196
  • Nov 11 10:33
    SrivigneshM synchronize #1196
  • Nov 11 10:20
    SrivigneshM commented #1196
  • Nov 11 10:18
    SrivigneshM synchronize #1196
  • Nov 11 09:48
    SrivigneshM synchronize #1196
  • Nov 11 07:54
    valan4ik reopened #1264
  • Nov 11 07:54
    valan4ik closed #1264
  • Nov 11 07:22
    SrivigneshM synchronize #1196
  • Nov 11 06:37
    SrivigneshM synchronize #1196
  • Nov 11 05:41
    yuwenxing opened #1265
  • Nov 10 14:39
    SrivigneshM synchronize #1196
  • Nov 10 14:16
    SrivigneshM synchronize #1196
  • Nov 10 13:41
    noorul closed #34
  • Nov 08 20:14

    bsikander on master

    doc(readme): Remove SPARK_CLASS… (compare)

  • Nov 08 20:14
    bsikander closed #1263
sj123050037
@sj123050037
@valan4ik Yes, we did upload the binary for the job successfully.
Valentina
@valan4ik
@sj123050037 I have never seen the error before, but as it’s coming from slick and dao, seems like smth in DB. Are you able to create a context as well?
Valentina
@valan4ik
Also, do you use Python context? I checked logs from the ticket you referenced and it seems like there the wrong context type was used.
sj123050037
@sj123050037
@valan4ik Yes, I am able to create context. Btw- I am able to get around this issue by increasing the spray.can.server.idletimeout. I am not able to understand though why that helped.
@bsikander : Is there a way to add additional jars to the driver classpath for every new spark-job? Consider that spark-context is already created and running in Yarn-cluster mode.
pgouda89
@pgouda89

I am getting follwoing error while creating spark-context against secure cluster.
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1566408512664_0017 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for <USER>: HDFS_DELEGATION_TOKEN owner=<USER>@<REALM>, renewer=<USER>, realUser=, issueDate=1566425923400, maxDate=1567030723400, sequenceNumber=41626652, masterKeyId=1674)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:308)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:185)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1541)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

SJS version: 0.9
Cluster: CDH 6.1 Kerberos+SSL enabled
I have added following properties on to spark-defaults.conf
spark.yarn.principal=<USER>@<REALM>
spark.yarn.keytab=<keytab location on SJS machine>/<USER>.keytab
spark.yarn.proxy-user=<USER>

Any leads will a great help.

sj123050037
@sj123050037
Hi Gurus, I see that yarn application for spark-context is always started with name = "spark.jobserver.JobManager" even though the context name used at the time context creation is different. How can I change the application name as it shows in the Yarn WebUI?
sj123050037
@sj123050037
Hello @bsikander, I am seeing this interesting SJS behavior. Can you please explain how this is happening?
I am having 2 instances of SJS running on same host. I have configured the two instances to run in Yarn cluster mode and talk SAME Hadoop cluster.
I observe that the spark-context created using one SJS instance is seen in the webUI of both the instances of SJS. I am wondering how this is happening? Since I have configured the 2 SJS to write to different databases, I do not expect this to happen. Can we not have 2 SJS running on same host communicating with the same Hadoop cluster? Is this a known limitation or is it as per design?
Valentina
@valan4ik
@sj123050037 it’s 2 independent Jobservers and you sure, that they use different H2/PostgreSQL/.. DB for Jobserver metadata?
sj123050037
@sj123050037
@valentina, Yes, they are pointing to 2 different directories.
Valentina
@valan4ik
Weird. From my knowledge everything jobserver knows comes from it’s own DB.
If I were you, I would probably check that when you create a context with different instances - different DBs are updated
sj123050037
@sj123050037
@valan4ik : lemme take a more closer look at how the DBs are getting updated.
Valentina
@valan4ik
@sj123050037 one super simple check is to upload a binary and then list binaries on another jobserver. If you see uploaded binary - they for sure share DB.
sj123050037
@sj123050037
@valan4ik : Thanks Valentina. It is the DB which resulted in this. Directories for the h2Db were different but ports were same.
Valentina
@valan4ik
@sj123050037 glad it helped :)
tosen1990
@tosen1990
Anyone who ever got this problem when creating context ?
[2019-08-30 19:29:07,557] INFO  AkkaClusterSupervisorActor [] [] - Failed to initialize context Actor[akka.tcp://JobServer@127.0.0.1:38116/user/jobManager-b29454ba-2ab6-4415-af45-1660ed00dc9c#-1325230401]
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
    at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at spark.jobserver.context.DefaultSparkContextFactory$$anon$1.<init>(SparkContextFactory.scala:144)
    at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:144)
    at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:139)
    at spark.jobserver.context.SparkContextFactory$class.makeContext(SparkContextFactory.scala:64)
    at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:139)
    at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:352)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at spark.jobserver.JobManagerActor$$anonfun$5.applyOrElse(JobManagerActor.scala:277)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:25)
    at spark.jobserver.common.akka.Slf4jLogging$class.spark$jobserver$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:34)
    at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:24)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at spark.jobserver.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at spark.jobserver.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 35 more
Valentina
@valan4ik
@tosen1990 can you share your post request?
tosen1990
@tosen1990
I'm using the sjs 0.9.1 and ambari with spark 2.2.0. Not sure it's the problem of version compatibility.
@valan4ik curl -d "" "192.168.101.23:8090/contexts/test-contextt?num-cpu-cores=16&memory-per-node=64g"
@valan4ik I been trying the sjs with yarn client using docker for many days,but still can't make it.
Valentina
@valan4ik
I don’t use yarn, but I agree with you, that it seems like version compatability problem. I googled briefly and found some advices to set spark.hadoop.yarn.timeline-service.enabled to false.
tosen1990
@tosen1990
@valan4ik Many thanks.
tosen1990
@tosen1990
@valan4ik hey,Val. I get stuck into this issue I opened yesterday.
It's still something about yarn. Hope you can give me some advice,plz.
Valentina
@valan4ik
@tosen1990 Sorry, I really have almost no clue about yarn :) But I took a look into your logs and you submit a context, it’s successfully created and then context has java.nio.channels.ClosedChannelException. This one you need to go after. There are quite some posts on stackoverflow, I guess you just need to try the advices there. Jobserver seems to be working fine here, so you probably need to tune Spark :)
tosen1990
@tosen1990
@valan4ik Thanks for your reply. Tbh,I've been spending a lot of time on it.
I'll keep digging into it. Thanks all the time!
pgouda89
@pgouda89
Hi All,
Is it possible to configure 1 executor per spark task uisng spark job server? I have noticed that we are sharing executor while running spark task.
sj123050037
@sj123050037
Hi @valan4ik: Can you help with this? I want to configure the executor and driver resources in Yarn-cluster mode. I specified following in the SJS script (local.sh.template). MANAGER_EXTRA_SPARK_CONFS="spark.executor.memory=6G|spark.driver.memory=4G|spark.executor.cores=2|spark.yarn.submit.waitAppCompletion=false". I see that this property is picked for submitting the spark-context creation request (Confirmed that by checking the spark_jobserver.out log which specifies the correct values for resources) but when I see the application/spark-context environment by going to the spark-jobhistory server URL, I see the configuration as "spark.executor.memory=4G". I am not sure why my configuration is not getting picked.
pgouda89
@pgouda89
Hi @valan4ik, Is it possible to configure multiple spark-jobservers with in a machine by using different server port and JMX port?
Is it possible to configure the h2-db port?
Valentina
@valan4ik
Hi @pgouda89 , please check local.conf.template in conf folder. Theoretically you should create your own configuration file with needed parameters for the deployment. There you can define the port for Jobserver (jobserver.port) and all settings for H2 (as far as I know Jobserver doesn’t have any prerequisites for your setup, you are free to use any url/password you wish.
pgouda89
@pgouda89

Hi @valan4ik Thanks a lot for the quick response. I was using SJS 0.9.0, And noticed that Spark-jobserver will start the H2 Db if spark.jobserver.startH2Server is enabled in the env.conf.
Actual code in JobServer.scala
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
val h2 = org.h2.tools.Server.createTcpServer("-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}

The above code starts the H2 Database using 9092 port and spark job server comes up if we use spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<serverHost>:9092/h2-db;AUTO_RECONNECT=TRUE". The real issue when the 9092 port is busy. H2 Database comes up on a free port and we wont be able to set the right spark.jobserver.sqldao.jdbc.url in the env.conf.

I have used following code to overcome this issue. Let me know if you know any other better solution.
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
var h2ServerPort = config.getString("spark.jobserver.h2ServerPort")
if ( h2ServerPort == null || h2ServerPort == "") {
h2ServerPort = "9092"
}
val h2 = org.h2.tools.Server.createTcpServer(
"-tcpPort", h2ServerPort, "-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}

And env.conf contains the following properties:
spark.jobserver.h2ServerPort=7272
spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<ServeHost>:7272/h2-db;AUTO_RECONNECT=TRUE"

Peter Farah
@pfarah65
Whats the best way to deploy a jobserver for Python?
Peter Farah
@pfarah65
Whenever I submit my job after making a context it always fails right away and This is my error
{
"duration": "1.202 secs",
"classPath": "job.WordCountSparkJob",
"startTime": "2019-10-11T13:38:52.777-04:00",
"context": "py-context",
"result": {
"message": "Context (py-context) for this job was terminated",
"errorClass": "",
"stack": ""
},
"status": "ERROR",
"jobId": "1e34ca22-b50c-4f74-b889-4e1979d1bfeb",
"contextId": ""
}
Peter Farah
@pfarah65
nevermind, I had to downgrade to Spark 2.3.2
pgouda89
@pgouda89

Hi All,

What is the best way to get the failure reason got the spark task if the jobs are submitted asynchronously, I am getting the following response for the ERROR case.
[{
"duration": "5.948 secs",
"classPath": "<ApplicationClass>",
"startTime": "2019-10-15T11:19:30.803+05:30",
"context": "INFA_DATA_PREVIEW_APP_DIS_SJS",
"status": "ERROR",
"jobId": "0e79d7b3-7e31-4232-b354-b1fb01b0928a",
"contextId": "08ee61f8-7425-45f2-a4ce-c4b456fd24b8"
}]

Do we need to perform anything extra in the ApplicationClass in order to get the error stacktrace in the job response body?

pgouda89
@pgouda89

Hi @valan4ik ,

Is it possible to change the spark context name using some parameter/env? Currently, it is set with "spark.jobserver.JobManager"

This message was deleted
image.png
Valentina
@valan4ik
Hi @pgouda89, is it YARN UI? Are you using last Jobserver version?
There was a PR a while ago: spark-jobserver/spark-jobserver#1156
I suppose it was addressing your issue
Valentina
@valan4ik
Also regarding getting the job error in GET /jobs request - it’s not so easy, you will need to change jobserver code :) Usually people do additional GET /job/$jobId requests for the jobs in error state
pgouda89
@pgouda89
@valan4ik Thanks a lot :) I was missing spark-jobserver/spark-jobserver#1156 as I was using spark jobserver 0.9.0
Peter Farah
@pfarah65
anyone having trouble building a docker image? when the build.sbt executes ./dev/make-distrubtion i keep getting failure "Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed"
itsmesrds
@itsmesrds
Hi @valan4ik, I'm running jobserver-0.8.0 on EMR yarn in production. Because of yarn node blacklisting issues spark context is getting killed and all the data stored in the jobserver cached Object gets cleared.
Is there any way to check the heartbeat of sparkContext . If such things happen in production, atleast based on the heartbeat, ill run the job which will cache the data again?
sj123050037
@sj123050037
hi @itsmesrds, 
You can do this by running the rest request "contexts/<context-name>".
This will give you the rest-response which consists of the context state. And based on the state value you can identity if it is running or in any other final state.
itsmesrds
@itsmesrds
Thanks @sj123050037 .
Hi team,
Is there any way to set the number of executors and executor memory for every job in a sparkContext ?
As far as i know we set those parmeters for creating the context. But Is the Same will work for every job request ?
itsmesrds
@itsmesrds
Basically spark parameter for every session in a context