by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 19:18
    valan4ik commented #1310
  • 18:26
    valan4ik commented #1310
  • 18:20
    valan4ik commented #1310
  • 18:20
    valan4ik commented #1310
  • 18:11
    valan4ik commented #1310
  • 18:11
    valan4ik commented #1310
  • 18:11
    valan4ik commented #1310
  • Jun 03 19:02
    vishnuchelle commented #1310
  • Jun 03 16:21
    valan4ik commented #1310
  • Jun 03 16:19
    valan4ik commented #1310
  • Jun 03 15:28
    vishnuchelle commented #1310
  • Jun 03 10:49
    valan4ik commented #1310
  • Jun 03 10:49
    valan4ik commented #1310
  • Jun 01 20:28
    codecov-commenter commented #1312
  • Jun 01 19:57
    Ennosigaeon synchronize #1312
  • Jun 01 19:57
    codecov-commenter commented #1312
  • Jun 01 17:01
    Ennosigaeon synchronize #1312
  • Jun 01 16:26
    codecov-commenter commented #1311
  • Jun 01 15:44
    codecov-commenter commented #1312
  • Jun 01 15:16
    Ennosigaeon commented #1306
tosen1990
@tosen1990
@bsikander sumit a PI job to my cluster and it works well. I do exectly as what the docs says and can't make sure what the error is.
Behroz Sikander
@bsikander
Interesting, then look a bit deeper to see maybe you can something interesting. Please post all the updates in the github issue.
tosen1990
@tosen1990
I'll keep looking to it. Thanks a lot.
itsmesrds
@itsmesrds

Hi @bsikander,

i'm currently using sjs:0.8.0 and spark2.2.1. Is this below issue fixed ? Things are working fine, when sending one request. but whenever i'm sending 2 requests

[2019-11-28 04:36:46,915] ERROR .jobserver.JobManagerActor [] [] - About to restart actor due to exception:
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:25)
        at spark.jobserver.common.akka.Slf4jLogging$class.spark$jobserver$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:34)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:24)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:23)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
        at spark.jobserver.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[2019-11-28 04:36:46,916] ERROR ka.actor.OneForOneStrategy [] [akka://JobServer/user/jobManager-6a-9111-2434bb36090d] - Futures timed out after [3 seconds]
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
Is there any resolution to above issue ?
itsmesrds
@itsmesrds
@valan4ik Please let us know, what will be the resolution for above issue
Behroz Sikander
@bsikander
@itsmesrds you are sending 2 requests (POST /jobs) in parallel?
The problem seems that while fetching the binary information it takes too long and Future times out. This exception causes the actor to restart (this is the default behavior of Akka actors).
umm btw which DAO you are using? SQL/file/C*?
itsmesrds
@itsmesrds

@bsikander, i have tried using file, h2 database. both are giving same results. I fixed this by changing the

val daoAskTimeout = Timeout(60 seconds)
in below file 

 'job-server/src/main/scala/spark/jobserver/JobManagerActor.scala"

Now that problem is fixed(Means not throwing the exception). but still, context is getting killed.

Behroz Sikander
@bsikander
ok, check the driver logs to find the exception.
Valentina
@valan4ik
@itsmesrds sorry, but just to clear things up: do you send 2 requests to create contexts?
First context is successfully created (and is visible in UI) and the second request works successfully only with increase of timeout (but context dies after short period of time)?
itsmesrds
@itsmesrds
No @valan4ik, context is already created with cached rdd in it. I'm sending two rest calls to get result's out of cached data. while sending the second the request , when first request is executing . It throws that exception and context is getting killed.
itsmesrds
@itsmesrds

Hi @valan4ik , @bsikander ,

Is there any way to set the number of executors and executor memory for every job in a pre-created sparkContext ?
As far as i know we can set those parmeters for creating the context. But Is the Same will work for every job request ?
Basically spark parameter for every session in a context

Behroz Sikander
@bsikander

well not really. As soon as the context is created the executors are spawned and they allocate resources.

Your only option is to use Dynamic Resource Allocation from Spark and it should work out of the box with jobserver (I hope).

pgouda89
@pgouda89
Hi @bsikander @valan4ik , We need to set spray.can.server.keystorePW="<keystore password>" to enabled SSL on the spark job server. It is not secure as the env.conf will have the string form of the secure key. Do we have better approach to pass the keystore password? I am using Spark jobserver 0.9.
Rajendra
@rajnitsrinagar1_twitter
I submitted a Spark job on EMR cluster and specified 3 executors with 5 GB memory for each. But it is in accepted state for last 15-20 mins and when I checked available resources then I found that core nodes do not have enough memory available but task nodes have more than 50 GB memory available but still my job is in accepted state
So is it necessary to have required memory available on core nodes to start the spark application instead having resource available on task nodes
Narasimhaporeddy
@Narasimhaporeddy
Hi Can some one please help me in setting up ssl , I am using CDH 5.15.1 with Spark job server version 0.8.0 using spark 2.2 . I have configured environment.conf spray.can.server { keystore = /opt/cloudera/security/pki/server.jks and trustore as /opt/cloudera/security/pki/ca-certs.jks } with the respective passwords. All my cloudera services work fine with ssl/tls enabled with the same settings
can some tell me if there is something wrong which i am doing
@pgouda89 can you please let me know how you have set up ssl
@bsikander can you please help me if there is something else which we need to configure to enable bothe client and server auth
ssl
Narasimhaporeddy
@Narasimhaporeddy
Has anyone enabled ssl both server and client with job-server and CDH ?
Narasimhaporeddy
@Narasimhaporeddy
ERROR erver.HttpServerConnection [] [akka.tcp://JobServer@xxxxxxxxxx:42392/user/IO-HTTP/listener-0/8] - Aborting encrypted connection to /XXXXXXXXX:xxxx due to [SSLHandshakeException:null cert chain] -> [SSLHandshakeException:null cert chain] is the error i face
Narasimhaporeddy
@Narasimhaporeddy
Hi
all
i am using combinedDao with hdfs+postgres
and could not make it work. Can some one please help me with the syntax

jobdao = spark.jobserver.io.CombinedDAO

combineddao {
  rootdir = "/tmp/combineddao"
  binarydao {
    class = spark.jobserver.io.HdfsBinaryDAO

dit = "hdfs path"

}
metadatadao {
class = spark.jobserver.io.MetaDataSqlDAO
}

sqldao {

  # Slick database driver, full classpath
  slick-driver = slick.driver.PostgresDriver

  # JDBC driver, full classpath
  jdbc-driver = org.postgresql.Driver

url =

user=
password =
flyway.locations="db/postgresql/migration ---> in the documentation it is given as db/combineddao/postgresql/migration
i see this error from spark job server Caused by: org.postgresql.util.PSQLException: ERROR: relation "BINARIES_CONTENTS" does not exist
and is unable to start
ERROR internal.command.DbMigrate [] [] - Migration of schema "public" to version 0.7.6 failed! Changes successfully rolled back.

Caused by: org.flywaydb.core.internal.dbsupport.FlywaySqlScriptException:

Migration V0_7_6__add_bin_hash_column.sql failed

SQL State : 42P01
Error Code : 0
Message : ERROR: relation "BINARIES_CONTENTS" does not exist
Location : db/postgresql/migration/V0_7_6/V0_7_6add_bin_hash_column.sql (/data01/spark/spark-jobserver/job-server/file:/data01/spark/spark-jobserver/job-server/spark-job-server.jar!/db/postgresql/migration/V0_7_6/V0_7_6add_bin_hash_column.sql)
Line : 4

Narasimhaporeddy
@Narasimhaporeddy
Statement : ALTER TABLE "BINARIES_CONTENTS" ADD COLUMN "BIN_HASH" BYTEA
@bsikander @noorul can you guys please help me with this
0.8.0 sjs version using with spark 2.2 on cdh 5.15.1
Valentina
@valan4ik

@Narasimhaporeddy Hi, I think you may have some documentation/jobserver versions mismatch:
if you use jobserver 0.8.0, please check documentation for this version: https://github.com/spark-jobserver/spark-jobserver/tree/0.8.0

db/combineddao/postgresql/migration was introduced only recently and is not part of 0.8.0 release
I think hdfs+postgres DAO would work only if you use master branch
You may use version 0.10.0 and use hdfs+h2 DAO (https://github.com/spark-jobserver/spark-jobserver/tree/e3c3d3ce9ba81b63608130d3904161c8246fe064)

Narasimhaporeddy
@Narasimhaporeddy
@valan4ik thanks for pointing that out to me. Indeed i was referring to latest documentation. One quick Question does the latest version support spark 2.2.0 ?
Valentina
@valan4ik
@Narasimhaporeddy Some dependencies and queries were updated, I am not sure if it is fully backward compatible. You could give it a try by setting SPARK_VERSION variable in your configuration file :)
I think there is no other way to try new DAOs
Narasimhaporeddy
@Narasimhaporeddy
aww makes sense. @valan4ik thanks for the quick help :)
Narasimhaporeddy
@Narasimhaporeddy
Hey did any one enable kerberos in sjs 0.8.0 , can you please let me know how you configured it
i did try to set up use-as-proxy-user=on in shiro.ini and try to export the keytab as mentioned in some of the knime documentation but it did not work for me.
@valan4ik @bsikander i did not see any specific doc on enabling kerberos in specific for spark job server can you please help me with this