[2019-08-30 19:29:07,557] INFO AkkaClusterSupervisorActor [] [] - Failed to initialize context Actor[akka.tcp://JobServer@127.0.0.1:38116/user/jobManager-b29454ba-2ab6-4415-af45-1660ed00dc9c#-1325230401]
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at spark.jobserver.context.DefaultSparkContextFactory$$anon$1.<init>(SparkContextFactory.scala:144)
at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:144)
at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:139)
at spark.jobserver.context.SparkContextFactory$class.makeContext(SparkContextFactory.scala:64)
at spark.jobserver.context.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:139)
at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:352)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at spark.jobserver.JobManagerActor$$anonfun$5.applyOrElse(JobManagerActor.scala:277)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at spark.jobserver.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:25)
at spark.jobserver.common.akka.Slf4jLogging$class.spark$jobserver$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:34)
at spark.jobserver.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:24)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at spark.jobserver.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24)
at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
at spark.jobserver.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 35 more
java.nio.channels.ClosedChannelException
. This one you need to go after. There are quite some posts on stackoverflow, I guess you just need to try the advices there. Jobserver seems to be working fine here, so you probably need to tune Spark :)
Hi @valan4ik Thanks a lot for the quick response. I was using SJS 0.9.0, And noticed that Spark-jobserver will start the H2 Db if spark.jobserver.startH2Server is enabled in the env.conf.
Actual code in JobServer.scala
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
val h2 = org.h2.tools.Server.createTcpServer("-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}
The above code starts the H2 Database using 9092 port and spark job server comes up if we use spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<serverHost>:9092/h2-db;AUTO_RECONNECT=TRUE". The real issue when the 9092 port is busy. H2 Database comes up on a free port and we wont be able to set the right spark.jobserver.sqldao.jdbc.url in the env.conf.
I have used following code to overcome this issue. Let me know if you know any other better solution.
// start embedded H2 server
if (config.getBoolean("spark.jobserver.startH2Server")) {
val rootDir = config.getString("spark.jobserver.sqldao.rootdir")
var h2ServerPort = config.getString("spark.jobserver.h2ServerPort")
if ( h2ServerPort == null || h2ServerPort == "") {
h2ServerPort = "9092"
}
val h2 = org.h2.tools.Server.createTcpServer(
"-tcpPort", h2ServerPort, "-tcpAllowOthers", "-baseDir", rootDir).start();
logger.info("Embeded H2 server started with base dir {} and URL {}", rootDir, h2.getURL: Any)
}
And env.conf contains the following properties:
spark.jobserver.h2ServerPort=7272
spark.jobserver.sqldao.jdbc.url="jdbc:h2:tcp://<ServeHost>:7272/h2-db;AUTO_RECONNECT=TRUE"
Hi All,
What is the best way to get the failure reason got the spark task if the jobs are submitted asynchronously, I am getting the following response for the ERROR case.
[{
"duration": "5.948 secs",
"classPath": "<ApplicationClass>",
"startTime": "2019-10-15T11:19:30.803+05:30",
"context": "INFA_DATA_PREVIEW_APP_DIS_SJS",
"status": "ERROR",
"jobId": "0e79d7b3-7e31-4232-b354-b1fb01b0928a",
"contextId": "08ee61f8-7425-45f2-a4ce-c4b456fd24b8"
}]
Do we need to perform anything extra in the ApplicationClass in order to get the error stacktrace in the job response body?
{
"duration": "69.903 secs",
"classPath": "org.air.ebds.organize.spark.jobserver.GBSegmentJob",
"startTime": "2019-11-18T02:41:07.524Z",
"context": "test-context",
"result": {
"message": "Job aborted due to stage failure: Task 2 in stage 10.0 failed 1 times, most recent failure: Lost task 2.0 in stage 10.0 (TID 530, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space\n\tat scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:117)\n\tat scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:115)\n\tat scala.Array$.ofDim(Array.scala:218)\n\tat geotrellis.raster.IntArrayTile$.ofDim(IntArrayTile.scala:271)\n\tat geotrellis.raster.ArrayTile$.alloc(ArrayTile.scala:422)\n\tat geotrellis.raster.CroppedTile.mutable(CroppedTile.scala:141)\n\tat geotrellis.raster.CroppedTile.mutable(CroppedTile.scala:133)\n\tat geotrellis.raster.CroppedTile.toArrayTile(CroppedTile.scala:125)\n\tat geotrellis.raster.crop.SinglebandTileCropMethods$class.crop(SinglebandTileCropMethods.scala:45)\n\tat geotrellis.raster.package$withTileMethods.crop(package.scala:55)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply$mcVI$sp(MultibandTileCropMethods.scala:45)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply(MultibandTileCropMethods.scala:44)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$$anonfun$cropBands$1.apply(MultibandTileCropMethods.scala:44)\n\tat scala.collection.immutable.Range.foreach(Range.scala:160)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$class.cropBands(MultibandTileCropMethods.scala:44)\n\tat geotrellis.raster.package$withMultibandTileMethods.cropBands(package.scala:83)\n\tat geotrellis.raster.crop.MultibandTileCropMethods$class.crop(MultibandTileCropMethods.scala:72)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.raster.crop.CropMethods$class.crop(CropMethods.scala:66)\n\tat geotrellis.raster.package$withMultibandTileMethods.crop(package.scala:83)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.geotrellis$spark$buffer$BufferTiles$$anonfun$$genSection$1(BufferTiles.scala:544)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17$$anonfun$apply$17.apply(BufferTiles.scala:549)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17$$anonfun$apply$17.apply(BufferTiles.scala:549)\n\tat scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)\n\tat scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)\n\tat scala.collection.immutable.List.foreach(List.scala:381)\n\tat scala.collection.TraversableLike$class.map(TraversableLike.scala:234)\n\tat scala.collection.immutable.List.map(List.scala:285)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.apply(BufferTiles.scala:549)\n\tat geotrellis.spark.buffer.BufferTiles$$anonfun$17.apply(BufferTiles.scala:521)\n\tat scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)\n\nDriver stacktrace:"
You can use the following to increase the driver memeory:
curl -X POST "localhost:8090/contexts/test-context?launcher.spark.driver.memory=512m
The valid values are 512m, 1g,2g etc