Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:43
    bsikander commented #1341
  • 16:15
    Ennosigaeon synchronize #1341
  • 16:08
    Ennosigaeon commented #1341
  • 16:03
    Ennosigaeon commented #1341
  • 15:53
    bsikander commented #1341
  • 15:53
    Ennosigaeon synchronize #1341
  • 12:43
    codecov-io commented #1339
  • 12:04
    Ennosigaeon synchronize #1339
  • 12:03
    Ennosigaeon commented #1339
  • 12:02
    Ennosigaeon synchronize #1339
  • 07:17
    Ennosigaeon commented #1341
  • 06:54
    noorul commented #1341
  • Jan 19 22:56
    murraytodd commented #1342
  • Jan 19 09:32
    Ennosigaeon synchronize #1341
  • Jan 19 08:37
    Ennosigaeon synchronize #1341
  • Jan 18 15:34
    mac-kapil commented #739
  • Jan 18 13:50
    valan4ik closed #1104
  • Jan 18 13:50
    valan4ik commented #1104
  • Jan 18 13:37
    valan4ik commented #1342
  • Jan 18 13:29
    valan4ik commented #1342
pgouda89
@pgouda89
Hi All,
Is it possible to configure 1 executor per spark task uisng spark job server? I have noticed that we are sharing executor while running spark task.
sj123050037
@sj123050037
Hi @valan4ik: Can you help with this? I want to configure the executor and driver resources in Yarn-cluster mode. I specified following in the SJS script (local.sh.template). MANAGER_EXTRA_SPARK_CONFS="spark.executor.memory=6G|spark.driver.memory=4G|spark.executor.cores=2|spark.yarn.submit.waitAppCompletion=false". I see that this property is picked for submitting the spark-context creation request (Confirmed that by checking the spark_jobserver.out log which specifies the correct values for resources) but when I see the application/spark-context environment by going to the spark-jobhistory server URL, I see the configuration as "spark.executor.memory=4G". I am not sure why my configuration is not getting picked.
pgouda89
@pgouda89
Hi @valan4ik, Is it possible to configure multiple spark-jobservers with in a machine by using different server port and JMX port?
Is it possible to configure the h2-db port?
Valentina Glagoleva
@valan4ik
Hi @pgouda89 , please check local.conf.template in conf folder. Theoretically you should create your own configuration file with needed parameters for the deployment. There you can define the port for Jobserver (jobserver.port) and all settings for H2 (as far as I know Jobserver doesn’t have any prerequisites for your setup, you are free to use any url/password you wish.
pgouda89
@pgouda89

Hi @valan4ik Thanks a lot for the quick response. I was using SJS 0.9.0, And noticed that Spark-jobserver will start the H2 Db if spark.jobserver.startH2Server is enabled in the env.conf.
Actual code in JobServer.scala
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
val h2 = org.h2.tools.Server.createTcpServer("-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}

The above code starts the H2 Database using 9092 port and spark job server comes up if we use spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<serverHost>:9092/h2-db;AUTO_RECONNECT=TRUE". The real issue when the 9092 port is busy. H2 Database comes up on a free port and we wont be able to set the right spark.jobserver.sqldao.jdbc.url in the env.conf.

I have used following code to overcome this issue. Let me know if you know any other better solution.
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
var h2ServerPort = config.getString("spark.jobserver.h2ServerPort")
if ( h2ServerPort == null || h2ServerPort == "") {
h2ServerPort = "9092"
}
val h2 = org.h2.tools.Server.createTcpServer(
"-tcpPort", h2ServerPort, "-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}

And env.conf contains the following properties:
spark.jobserver.h2ServerPort=7272
spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<ServeHost>:7272/h2-db;AUTO_RECONNECT=TRUE"

Peter Farah
@pfarah65
Whats the best way to deploy a jobserver for Python?
Peter Farah
@pfarah65
Whenever I submit my job after making a context it always fails right away and This is my error
{
"duration": "1.202 secs",
"classPath": "job.WordCountSparkJob",
"startTime": "2019-10-11T13:38:52.777-04:00",
"context": "py-context",
"result": {
"message": "Context (py-context) for this job was terminated",
"errorClass": "",
"stack": ""
},
"status": "ERROR",
"jobId": "1e34ca22-b50c-4f74-b889-4e1979d1bfeb",
"contextId": ""
}
Peter Farah
@pfarah65
nevermind, I had to downgrade to Spark 2.3.2
pgouda89
@pgouda89

Hi All,

What is the best way to get the failure reason got the spark task if the jobs are submitted asynchronously, I am getting the following response for the ERROR case.
[{
"duration": "5.948 secs",
"classPath": "<ApplicationClass>",
"startTime": "2019-10-15T11:19:30.803+05:30",
"context": "INFA_DATA_PREVIEW_APP_DIS_SJS",
"status": "ERROR",
"jobId": "0e79d7b3-7e31-4232-b354-b1fb01b0928a",
"contextId": "08ee61f8-7425-45f2-a4ce-c4b456fd24b8"
}]

Do we need to perform anything extra in the ApplicationClass in order to get the error stacktrace in the job response body?

pgouda89
@pgouda89

Hi @valan4ik ,

Is it possible to change the spark context name using some parameter/env? Currently, it is set with "spark.jobserver.JobManager"

This message was deleted
image.png
Valentina Glagoleva
@valan4ik
Hi @pgouda89, is it YARN UI? Are you using last Jobserver version?
There was a PR a while ago: spark-jobserver/spark-jobserver#1156
I suppose it was addressing your issue
Valentina Glagoleva
@valan4ik
Also regarding getting the job error in GET /jobs request - it’s not so easy, you will need to change jobserver code :) Usually people do additional GET /job/$jobId requests for the jobs in error state
pgouda89
@pgouda89
@valan4ik Thanks a lot :) I was missing spark-jobserver/spark-jobserver#1156 as I was using spark jobserver 0.9.0
Peter Farah
@pfarah65
anyone having trouble building a docker image? when the build.sbt executes ./dev/make-distrubtion i keep getting failure "Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed"
itsmesrds
@itsmesrds
Hi @valan4ik, I'm running jobserver-0.8.0 on EMR yarn in production. Because of yarn node blacklisting issues spark context is getting killed and all the data stored in the jobserver cached Object gets cleared.
Is there any way to check the heartbeat of sparkContext . If such things happen in production, atleast based on the heartbeat, ill run the job which will cache the data again?
sj123050037
@sj123050037
hi @itsmesrds, 
You can do this by running the rest request "contexts/<context-name>".
This will give you the rest-response which consists of the context state. And based on the state value you can identity if it is running or in any other final state.
itsmesrds
@itsmesrds
Thanks @sj123050037 .
Hi team,
Is there any way to set the number of executors and executor memory for every job in a sparkContext ?
As far as i know we set those parmeters for creating the context. But Is the Same will work for every job request ?
itsmesrds
@itsmesrds
Basically spark parameter for every session in a context
tosen1990
@tosen1990
@valan4ik
hey,I try different ways to rewrite the driver memory,however the job always throw this error. And the spark context also shut down at the same time.
Do you have any idea?
{
  "duration": "69.903 secs",
  "classPath": "org.air.ebds.organize.spark.jobserver.GBSegmentJob",
  "startTime": "2019-11-18T02:41:07.524Z",
  "context": "test-context",
  "result": {
    "message": "Job aborted due to stage failure: Task 2 in stage 10.0 failed 1 times, most recent failure: Lost task 2.0 in stage 10.0 (TID 530, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space\n\tat scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:117)\n\tat scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:115)\n\tat scala.Array$.ofDim(Array.scala:218)\n\tat geotrellis.raster.IntArrayTile$.ofDim(IntArrayTile.scala:271)\n\tat geotrellis.raster.ArrayTile$.alloc(ArrayTile.scala:422)\n\tat geotrellis.raster.CroppedTile.mutable(CroppedTile.scala:141)\n\tat geotrellis.raster.CroppedTile.mutable(CroppedTile.scala:133)\n\tat geotrellis.raster.CroppedTile.toArrayTile(CroppedTile.scala:125)\n\tat geotrellis.raster.crop.SinglebandTileCropMethods$class.crop(SinglebandTileCropMethods.scala:45)\n\tat geotrellis.raster.package$withTileMethods.crop(package.scala:55)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply$mcVI$sp(MultibandTileCropMethods.scala:45)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply(MultibandTileCropMethods.scala:44)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply(MultibandTileCropMethods.scala:44)\n\tat scala.collection.immutable.Range.foreach(Range.scala:160)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$class.cropBands(MultibandTileCropMethods.scala:44)\n\tat geotrellis.raster.package$withMultibandTileMethods.cropBands(package.scala:83)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$class.crop(MultibandTileCropMethods.scala:72)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.raster.crop.CropMethods$class.crop(CropMethods.scala:66)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.geotrellis$spark$buffer$BufferTiles$$anonfun$$genSection$1(BufferTiles.scala:544)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17$$anonfun$apply$17.apply(BufferTiles.scala:549)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17$$anonfun$apply$17.apply(BufferTiles.scala:549)\n\tat scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)\n\tat scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)\n\tat scala.collection.immutable.List.foreach(List.scala:381)\n\tat scala.collection.TraversableLike$class.map(TraversableLike.scala:234)\n\tat scala.collection.immutable.List.map(List.scala:285)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.apply(BufferTiles.scala:549)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.apply(BufferTiles.scala:521)\n\tat scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)\n\nDriver stacktrace:"
pgouda89
@pgouda89
Hi @tosen1990 ,
Did you try changing server_shart.sh? If not, try to use the larger value for --driver-memory in the spark-submit command.
tosen1990
@tosen1990
@pgouda89 I tried and failed.
Behroz Sikander
@bsikander
@tosen1990 which version of jobserver are you using?

You can use the following to increase the driver memeory:

curl -X POST "localhost:8090/contexts/test-context?launcher.spark.driver.memory=512m

The valid values are 512m, 1g,2g etc

tosen1990
@tosen1990
@bsikander hey.I've resolved this problem by launcher.spark.driver.memory=512m.
I'm searching this method from the history in this channel. Thanks.
tosen1990
@tosen1990
You are right. I read it from that link.
Behroz Sikander
@bsikander
cool. Btw i think that the information should also be in the README of jobserver. If you want then feel free to open a PR.
tosen1990
@tosen1990
Actually,I've stucked into this issue for several months. Tbh,I'v tried whatever i can ,but still can't fix this.
Behroz Sikander
@bsikander
umm, could you post driver logs in the issue, this will give a full picture
tosen1990
@tosen1990
And when I use the sjs without using docker,it works fine.
Behroz Sikander
@bsikander
ahan
the exception in driver logs should give you clues what is happening.
tosen1990
@tosen1990
Actually,I can't find any driver logs tbh.Just found the logs from yarn.
Behroz Sikander
@bsikander
well, from the UI, you should be able to download the driver logs. If you can't find the logs then it could point to the fact that jobserver fails before the driver is started.
tosen1990
@tosen1990
image.png
image.png
I'll dig it up deeply later. Thanks for your advice.
Behroz Sikander
@bsikander

it seems that driver container is failing to launch.

Please do 1 thing: Directly use spark-submit to submit a WordCount/Pi job to your YARN cluster and see if it runs through.

If it does, then problem could on jobserver side (which i doubt), otherwise the problem is with your YARN setup and you should fix it and then use jobserver again.

tosen1990
@tosen1990
@bsikander sumit a PI job to my cluster and it works well. I do exectly as what the docs says and can't make sure what the error is.
Behroz Sikander
@bsikander
Interesting, then look a bit deeper to see maybe you can something interesting. Please post all the updates in the github issue.
tosen1990
@tosen1990
I'll keep looking to it. Thanks a lot.
itsmesrds
@itsmesrds

Hi @bsikander,

i'm currently using sjs:0.8.0 and spark2.2.1. Is this below issue fixed ? Things are working fine, when sending one request. but whenever i'm sending 2 requests

[2019-11-28 04:36:46,915] ERROR .jobserver.JobManagerActor [] [] - About to restart actor due to exception:
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:25)
        at spark.jobserver.common.akka.Slf4jLogging$class.spark$jobserver$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:34)
        at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:24)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:23)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
        at spark.jobserver.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[2019-11-28 04:36:46,916] ERROR ka.actor.OneForOneStrategy [] [akka://JobServer/user/jobManager-6a-9111-2434bb36090d] - Futures timed out after [3 seconds]
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:167)
        at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
        at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:165)
        at scala.concurrent.Await$.result(package.scala:190)
        at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:282)
        at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:192)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
Is there any resolution to above issue ?
itsmesrds
@itsmesrds
@valan4ik Please let us know, what will be the resolution for above issue