Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Oct 26 14:16
    harsh1107 commented #1328
  • Oct 26 14:15
    harsh1107 commented #1328
  • Oct 26 14:03
    harsh1107 commented #1328
  • Oct 26 14:00
    harsh1107 commented #1328
  • Oct 23 08:20
    sushanth-hi commented #1328
  • Oct 23 08:06
    sushanth-hi commented #1328
  • Oct 23 07:08
    Nibooor closed #1268
  • Oct 23 06:58
    Nibooor labeled #1275
  • Oct 23 06:57
    Nibooor labeled #1277
  • Oct 23 06:54
    valan4ik labeled #1315
  • Oct 23 06:54
    valan4ik closed #1315
  • Oct 23 06:54
    valan4ik commented #1315
  • Oct 23 06:54
    valan4ik labeled #1269
  • Oct 23 06:53
    Nibooor closed #1289
  • Oct 23 06:53
    Nibooor commented #1289
  • Oct 23 06:52
    Nibooor labeled #1289
  • Oct 23 06:52
    Nibooor labeled #1289
  • Oct 23 06:50
    Nibooor labeled #1290
  • Oct 23 06:45
    Nibooor closed #1303
  • Oct 23 06:44
    Nibooor labeled #1303
tosen1990
@tosen1990
Actually,I've stucked into this issue for several months. Tbh,I'v tried whatever i can ,but still can't fix this.
Behroz Sikander
@bsikander
umm, could you post driver logs in the issue, this will give a full picture
tosen1990
@tosen1990
And when I use the sjs without using docker,it works fine.
Behroz Sikander
@bsikander
ahan
the exception in driver logs should give you clues what is happening.
tosen1990
@tosen1990
Actually,I can't find any driver logs tbh.Just found the logs from yarn.
Behroz Sikander
@bsikander
well, from the UI, you should be able to download the driver logs. If you can't find the logs then it could point to the fact that jobserver fails before the driver is started.
tosen1990
@tosen1990
image.png
image.png
I'll dig it up deeply later. Thanks for your advice.
Behroz Sikander
@bsikander

it seems that driver container is failing to launch.

Please do 1 thing: Directly use spark-submit to submit a WordCount/Pi job to your YARN cluster and see if it runs through.

If it does, then problem could on jobserver side (which i doubt), otherwise the problem is with your YARN setup and you should fix it and then use jobserver again.

tosen1990
@tosen1990
@bsikander sumit a PI job to my cluster and it works well. I do exectly as what the docs says and can't make sure what the error is.
Behroz Sikander
@bsikander
Interesting, then look a bit deeper to see maybe you can something interesting. Please post all the updates in the github issue.
tosen1990
@tosen1990
I'll keep looking to it. Thanks a lot.
itsmesrds
@itsmesrds

Hi @bsikander,

i'm currently using sjs:0.8.0 and spark2.2.1. Is this below issue fixed ? Things are working fine, when sending one request. but whenever i'm sending 2 requests

[2019-11-28 04:36:46,915] ERROR .jobserver.JobManagerActor [] [] - About to restart actor due to exception:
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:25)
        at spark.jobserver.common.akka.Slf4jLogging$class.spark$jobserver$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:34)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:24)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:23)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
        at spark.jobserver.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[2019-11-28 04:36:46,916] ERROR ka.actor.OneForOneStrategy [] [akka://JobServer/user/jobManager-6a-9111-2434bb36090d] - Futures timed out after [3 seconds]
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
Is there any resolution to above issue ?
itsmesrds
@itsmesrds
@valan4ik Please let us know, what will be the resolution for above issue
Behroz Sikander
@bsikander
@itsmesrds you are sending 2 requests (POST /jobs) in parallel?
The problem seems that while fetching the binary information it takes too long and Future times out. This exception causes the actor to restart (this is the default behavior of Akka actors).
umm btw which DAO you are using? SQL/file/C*?
itsmesrds
@itsmesrds

@bsikander, i have tried using file, h2 database. both are giving same results. I fixed this by changing the

val daoAskTimeout = Timeout(60 seconds)
in below file 

 'job-server/src/main/scala/spark/jobserver/JobManagerActor.scala"

Now that problem is fixed(Means not throwing the exception). but still, context is getting killed.

Behroz Sikander
@bsikander
ok, check the driver logs to find the exception.
Valentina
@valan4ik
@itsmesrds sorry, but just to clear things up: do you send 2 requests to create contexts?
First context is successfully created (and is visible in UI) and the second request works successfully only with increase of timeout (but context dies after short period of time)?
itsmesrds
@itsmesrds
No @valan4ik, context is already created with cached rdd in it. I'm sending two rest calls to get result's out of cached data. while sending the second the request , when first request is executing . It throws that exception and context is getting killed.
itsmesrds
@itsmesrds

Hi @valan4ik , @bsikander ,

Is there any way to set the number of executors and executor memory for every job in a pre-created sparkContext ?
As far as i know we can set those parmeters for creating the context. But Is the Same will work for every job request ?
Basically spark parameter for every session in a context

Behroz Sikander
@bsikander

well not really. As soon as the context is created the executors are spawned and they allocate resources.

Your only option is to use Dynamic Resource Allocation from Spark and it should work out of the box with jobserver (I hope).

pgouda89
@pgouda89
Hi @bsikander @valan4ik , We need to set spray.can.server.keystorePW="<keystore password>" to enabled SSL on the spark job server. It is not secure as the env.conf will have the string form of the secure key. Do we have better approach to pass the keystore password? I am using Spark jobserver 0.9.
Rajendra
@rajnitsrinagar1_twitter
I submitted a Spark job on EMR cluster and specified 3 executors with 5 GB memory for each. But it is in accepted state for last 15-20 mins and when I checked available resources then I found that core nodes do not have enough memory available but task nodes have more than 50 GB memory available but still my job is in accepted state
So is it necessary to have required memory available on core nodes to start the spark application instead having resource available on task nodes
Narasimhaporeddy
@Narasimhaporeddy
Hi Can some one please help me in setting up ssl , I am using CDH 5.15.1 with Spark job server version 0.8.0 using spark 2.2 . I have configured environment.conf spray.can.server { keystore = /opt/cloudera/security/pki/server.jks and trustore as /opt/cloudera/security/pki/ca-certs.jks } with the respective passwords. All my cloudera services work fine with ssl/tls enabled with the same settings
can some tell me if there is something wrong which i am doing
@pgouda89 can you please let me know how you have set up ssl
@bsikander can you please help me if there is something else which we need to configure to enable bothe client and server auth
ssl
Narasimhaporeddy
@Narasimhaporeddy
Has anyone enabled ssl both server and client with job-server and CDH ?
Narasimhaporeddy
@Narasimhaporeddy
ERROR erver.HttpServerConnection [] [akka.tcp://JobServer@xxxxxxxxxx:42392/user/IO-HTTP/listener-0/8] - Aborting encrypted connection to /XXXXXXXXX:xxxx due to [SSLHandshakeException:null cert chain] -> [SSLHandshakeException:null cert chain] is the error i face
Narasimhaporeddy
@Narasimhaporeddy
Hi
all
i am using combinedDao with hdfs+postgres
and could not make it work. Can some one please help me with the syntax

jobdao = spark.jobserver.io.CombinedDAO

combineddao {
  rootdir = "/tmp/combineddao"
  binarydao {
    class = spark.jobserver.io.HdfsBinaryDAO

dit = "hdfs path"

}
metadatadao {
class = spark.jobserver.io.MetaDataSqlDAO
}

sqldao {

  # Slick database driver, full classpath
  slick-driver = slick.driver.PostgresDriver

  # JDBC driver, full classpath
  jdbc-driver = org.postgresql.Driver

url =

user=
password =
flyway.locations="db/postgresql/migration ---> in the documentation it is given as db/combineddao/postgresql/migration
i see this error from spark job server Caused by: org.postgresql.util.PSQLException: ERROR: relation "BINARIES_CONTENTS" does not exist
and is unable to start