A kernel that enables applications to interact with Apache Spark (http://toree.apache.org/)
// We run on Spark 2.4.6 compiled for Scala 2.11
%AddDeps io.delta delta-core_2.11 0.6.1
%AddDeps org.apache.hadoop hadoop-azure 2.7.3
%AddDeps com.azure azure-storage-blob 12.8.0
%AddDeps com.azure azure-storage-common 12.8.0
Marking io.delta:delta-core_2.11:0.6.1 for download
Obtained 2 files
Marking org.apache.hadoop:hadoop-azure:2.7.3 for download
Obtained 2 files
Marking com.azure:azure-storage-blob:12.8.0 for download
Obtained 2 files
Marking com.azure:azure-storage-common:12.8.0 for download
Obtained 2 files
import org.apache.hadoop.fs.azure.NativeAzureFileSystem // Just to verify it can be found
val df=spark.read.format("delta").load("wasb://######") // This line fails
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:160)
at org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:252)
at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:153)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 50 elided
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 64 more
Since I’m not a Spark expert, I can’t tell if you’re running into issues with k8s and spark in general or something with the way EG works. Here are some other pieces of information you might find helpful.
The image for the ‘vanilla’ Scala and spark Scala kernelspecs is the same image. As a result, you could try launching the vanilla Scala image and configure the spark context yourself to see if that helps, or you can modify the spark scala kernelspec and set --RemoteProcessProxy.spark-context-initialization-mode=None
so that it doesn’t create a spark context. The difference between these is that with the former, we control the kernel-pod configuration, whereas Spark controls the pod with the latter and you need to specify pod-related settings in SPARK_OPTS (_TOREE_SPARK_OPTS).
Also note that any envs prefixed with KERNEL_
flow from the client to EG and are available to the kernel. As mentioned, the vanilla kernels are associated with a kernel-pod.yaml template file (in their scripts directories) that can be altered for things like mounts, init containers, etc. and the user-specific “names” could be specified via a KERNEL_
env value. (Note that KERNEL_USERNAME
is already used by many applications for this.)
In EG 3.0 - which will support Spark 3.0, Spark on Kubernetes enables “pod templates” which will enable the convergence of the two launch approaches (the Spark 2.x on k8s controls the pod creation - so we can’t use our templated approach for that). PR jupyter/enterprise_gateway#559 has been waiting around this - but we’re not there yet.
Hello. I am trying to use toree 0.4.0 with jupyterhub 1.1.0. Everything looks great, but i can't see any exceptions traces from the code that i'm running in nb. What can i try, how can i debug this issue?
I am facing similar issue. I can see few lines out of whole stack trace. I am using toree 0.6.0.
sc
) I think it would be too late to apply magics for this. It looks like toree’s support for magics are more at the line and cell level with the notion that the context is already established. Copying @lresende for confirmation/correction.
Hi! Another problem using 0.5.0-rc4 with Spark 3.2.0. Please advise how to resolve.
Exception in thread "main" scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:24)
at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:25)
at scala.reflect.internal.Mirrors$RootsBase.$anonfun$getModuleOrClass$5(Mirrors.scala:61)
at scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:61)
at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage$lzycompute(Definitions.scala:198)
at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage(Definitions.scala:198)
at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass$lzycompute(Definitions.scala:199)
at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass(Definitions.scala:199)
at scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr$lzycompute(Definitions.scala:1251)
at scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr(Definitions.scala:1250)
at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1408)
at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1407)
at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1450)
at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1450)
at scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1506)
at scala.tools.nsc.Global$Run.<init>(Global.scala:1213)
at scala.tools.nsc.interpreter.IMain.compileSourcesKeepingRun(IMain.scala:432)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.compileAndSaveRun(IMain.scala:814)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.compile(IMain.scala:772)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:637)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.$anonfun$start$1(ScalaInterpreterSpecific.scala:291)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.start(ScalaInterpreterSpecific.scala:282)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.start$(ScalaInterpreterSpecific.scala:266)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.start(ScalaInterpreter.scala:43)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.init(ScalaInterpreter.scala:94)
at org.apache.toree.boot.layer.InterpreterManager.$anonfun$initializeInterpreters$1(InterpreterManager.scala:35)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:214)
at org.apache.toree.boot.layer.InterpreterManager.initializeInterpreters(InterpreterManager.scala:34)
at org.apache.toree.boot.layer.StandardComponentInitialization.initializeComponents(ComponentInitialization.scala:87)
at org.apache.toree.boot.layer.StandardComponentInitialization.initializeComponents$(ComponentInitialization.scala:69)
at org.apache.toree.Main$$anon$1.initializeComponents(Main.scala:35)
at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:102)
at org.apache.toree.Main$.delayedEndpoint$org$apache$toree$Main$1(Main.scala:35)
at org.apache.toree.Main$delayedInit$body.apply(Main.scala:24)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at org.apache.toree.Main$.main(Mai
Hi @s3uzz
Yes you could try an older python, I am still using 3.6 as mentioned previously.
Below is a small example bash script I use to install
#!/bin/sh
export VERSION=0.1.0
export SPARK_HOME=/path/to/sparkHome
jars="file:///path/to/jar"
jars="$jars,file:///path/to/jar"
jupyter-toree install \
--replace \
--debug \
--user \
--kernel_name "project $VERSION" \
--spark_home=${SPARK_HOME} \
--spark_opts="--master yarn --jars $jars"
Here’s a link to a related Toree JIRA: https://issues.apache.org/jira/browse/TOREE-522?jql=project%20%3D%20TOREE%20AND%20component%20%3D%20Kernel
In there @lresende points out that this appears to be a function of the front-end - whether it is Notebook or Lab. I see the same behavior. In fact, when I run using the classic Notebook front-end, I don’t see the error when its syntax related (e.g., “blah”), but do get some error result for something like a divide by zero (e.g., “1/0”). However, what is interesting is that if I save that notebook and then open it in Lab, I see the appropriate messages in both cases. So, as @amitzo is pointing out, the error details are persisted in the notebook, but something about the “display” that’s not working.
I’m curious if you see any error output resulting from divide-by-zero issues?
Here are two screen shots, the first using Notebook and the second using Lab where, for the second, I merely opened the notebook file produced from the first...