Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community

    I did all of this.

    But what solved my problem.
    Was simply switching from Java 11 to Java 8.

    That's it
    It's working now.
    Thanks a lot guys.
    Even though it was my first time, building from source was really fun.
    Hope to see Toree v1.0 soon
    Luciano Resende
    Not java, but Scala…
    Unlike Java, Scala binary compatibility is not so great, so when you move from one version to another you need to recompile, there were also little tweaks need in the code, and all of these has been done in master already, that’s why building from master it works
    Pol Santamaria

    Hi, we are testing the Apache Toree with Jupyter Enterprise Gateway on k8s. It seems that when the kernel is launched, the dependencies specified by "--packages" are downloaded in the enterpise-gateway and then the driver has no access to it.
    Another solution we tried was using the magicline %AddDeps, then they are downloaded into the /tmp of the Driver pod, but it fails to use them. I think it is related to the SparkContext already being created, since we are trying to load the "hadoop-azure" and similars to access the Azure blob storage.

    Does anybody have some suggestions on how can I solve this dependencies issue? Many thanks! :)

    Kevin Bates
    Hi @polsm91. I would imagine this could be accomplished via hdfs/http references or extending the TOREE_SPARK_OPTS in the kernel.json with volume information: https://spark.apache.org/docs/2.4.6/running-on-kubernetes.html#dependency-management
    Do either of those approaches fit your scenario/environment?
    Pol Santamaria
    Thanks @kevin-bates for your suggestions. To clarify, what we would like is for Spark to download the packages from Maven. For regular jars (application jars), we are storing them on S3 and calling %AddJar for now, and I don't think is viable to store all the dependencies there.
    When we add the --packages ${MVN_DEPS} to the __TOREE_SPARK_OPTS__ in kernel.json, Spark Toree downloads the dependencies to the enterprise_gateway pod and are forgotten there. Since the gateway serves multiple users, we can't put a shared volume on the $HOME/.ivy2 folder of the gateway.
    Pol Santamaria
    Another option could be downloading the packages with %AddDeps, and setting a shared volume where they are downloaded. However, in my case, the Spark driver was not detecting the packages providing another FileSystem for Hadoop (I also tried adding the --transitive flag. For example:
    // We run on Spark 2.4.6 compiled for Scala 2.11
    %AddDeps io.delta delta-core_2.11 0.6.1
    %AddDeps org.apache.hadoop hadoop-azure 2.7.3
    %AddDeps com.azure azure-storage-blob 12.8.0
    %AddDeps com.azure azure-storage-common 12.8.0

    Marking io.delta:delta-core_2.11:0.6.1 for download
    Obtained 2 files
    Marking org.apache.hadoop:hadoop-azure:2.7.3 for download
    Obtained 2 files
    Marking com.azure:azure-storage-blob:12.8.0 for download
    Obtained 2 files
    Marking com.azure:azure-storage-common:12.8.0 for download
    Obtained 2 files

    import org.apache.hadoop.fs.azure.NativeAzureFileSystem // Just to verify it can be found
    val df=spark.read.format("delta").load("wasb://######") // This line fails

    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:160)
    at org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:252)
    at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:153)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    ... 50 elided
    Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
    ... 64 more

    Pol Santamaria
    I assume the reason is that the SparkContext was already created when the Azure packages were added, and the Hadoop FileSystem submodule needs to be reloaded in some way :/
    Kevin Bates

    Since I’m not a Spark expert, I can’t tell if you’re running into issues with k8s and spark in general or something with the way EG works. Here are some other pieces of information you might find helpful.

    The image for the ‘vanilla’ Scala and spark Scala kernelspecs is the same image. As a result, you could try launching the vanilla Scala image and configure the spark context yourself to see if that helps, or you can modify the spark scala kernelspec and set --RemoteProcessProxy.spark-context-initialization-mode=None so that it doesn’t create a spark context. The difference between these is that with the former, we control the kernel-pod configuration, whereas Spark controls the pod with the latter and you need to specify pod-related settings in SPARK_OPTS (_TOREE_SPARK_OPTS).

    Also note that any envs prefixed with KERNEL_ flow from the client to EG and are available to the kernel. As mentioned, the vanilla kernels are associated with a kernel-pod.yaml template file (in their scripts directories) that can be altered for things like mounts, init containers, etc. and the user-specific “names” could be specified via a KERNEL_ env value. (Note that KERNEL_USERNAME is already used by many applications for this.)

    In EG 3.0 - which will support Spark 3.0, Spark on Kubernetes enables “pod templates” which will enable the convergence of the two launch approaches (the Spark 2.x on k8s controls the pod creation - so we can’t use our templated approach for that). PR jupyter/enterprise_gateway#559 has been waiting around this - but we’re not there yet.

    Luciano Resende
    Another simple test that could be done is to check what is the Spark behavior when doing this outside of the kernel environment, e.g. when you submit an app passing the opts and see how Spark behave
    Pol Santamaria
    Many thanks @kevin-bates for the extensive reply. I will experiment with your suggestions and also explore how it is going to work in EG 3.0, we run Spark 2.X but we plan on upgrading to 3.X before 2021.
    Todd Tao
    spark 3.0 support?
    Luciano Resende
    is available on the release candidate
    Hello. I am trying to use toree 0.4.0 with jupyterhub 1.1.0. Everything looks great, but i can't see any exceptions traces from the code that i'm running in nb. What can i try, how can i debug this issue?
    Oh, i see there is already similar issue on bug tracker
    Any workarounds?
    Oh and my jupyterhub version is 6.1.4
    Devendra Singh
    hi there,
    I am unable to view complete stacktrace in my notebook in case of errors thrown by the scala kernel. It shows only 3-4 lines out of all the stack trace. Could you please help me configure toree to show complete stack trace in such cases?

    Hello. I am trying to use toree 0.4.0 with jupyterhub 1.1.0. Everything looks great, but i can't see any exceptions traces from the code that i'm running in nb. What can i try, how can i debug this issue?

    I am facing similar issue. I can see few lines out of whole stack trace. I am using toree 0.6.0.

    Luciano Resende
    i believe I fixed that on master
    i should be able to try another rc before end of the week, i was still trying to fix another issue related to magics before i do that
    Stanislav G.
    How do I add autocomplete for Spark? For me only highlighting work (.
    Arthur Stemmer
    Hi! Which version of Toree would be able to support Spark 3.x? Would 0.5.0 suffice or do I need to build from master?
    Luciano Resende
    the 0.5.0 was not approved yet, it’s just an rc, but that would work or master…
    i have been using the rc for a little while, but need to find time to wrap up the release
    Arthur Stemmer
    Great, that works. Although it does seem like the kernel is not providing error information in the notebook on say a syntax error (it seems to fail silently), is that expected?
    Luciano Resende
    Are you using classic notebook? Or lab?
    Arthur Stemmer
    I'm using a classic notebook via Jupyter Enterprise Gateway
    Arthur Stemmer
    Is there a list containing past occurrences of security vulnerabilities (CVEs) publicly available?
    Rahul Goyal
    does Toree support magic commands similar to sparkmagic: %config which will let notebook users specify spark configs dynamically?
    Kevin Bates
    Since the toree kernel’s startup creates the spark context (using parameters conveyed via the kernel spec and available via sc) I think it would be too late to apply magics for this. It looks like toree’s support for magics are more at the line and cell level with the notion that the context is already established. Copying @lresende for confirmation/correction.
    Rahul Goyal
    I agree with you @kevin-bates that the toree kernel would already have started the spark context and we can not apply config level magics...based on my limited understanding of "sparkmagic kernel", this problem is solved by delaying the creation of spark context until needed..that give scope to execute magic cells with "%config" and the kernel accumulates it.. it is not possible to do that same with Toree?
    Kevin Bates
    I don’t think so - but need @lresende to confirm.
    Luciano Resende
    if you set toree options to not start the context, you can create your own with SparkSession
    Kevin Bates
    Rahul Goyal
    how do i do that @lresende .. is there any link for reference?
    Luciano Resende
    Note that I just ssaw a comment there which implies that there might be a bug on the code … would appreciate help investigating and providing a patch
    David M.
    I have installed Apache Toree on Jupyter:
    pip install toree
    jupyter toree install --spark_home=/usr/local/spark-3.2.0-bin-hadoop3.2/
    but kernel crashes after start:
    Exception in thread "main" java.lang.NoClassDefFoundError: scala/App$class
    at org.apache.toree.Main$.<init>(Main.scala:24)
    at org.apache.toree.Main$.<clinit>(Main.scala)
    at org.apache.toree.Main.main(Main.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: scala.App$class
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    ... 15 more

    Kindly ask to advise how to fix it.

    Hi @s3uzz, my guess would be that you are using perhaps a version like 0.4 ? Which is still on Scala 2.11 and if you have hadoop 3.2.x you have Scala 2.12.

    Compile a newer version for Scala 2.12 support.

    David M.
    Thank you. My fault, version from pypi.org is 0.4.0 from Aug 2020. Just followed documentation...
    5 replies
    David M.

    Hi! Another problem using 0.5.0-rc4 with Spark 3.2.0. Please advise how to resolve.

    Exception in thread "main" scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
    at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:24)
    at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:25)
    at scala.reflect.internal.Mirrors$RootsBase.$anonfun$getModuleOrClass$5(Mirrors.scala:61)
    at scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:61)
    at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage$lzycompute(Definitions.scala:198)
    at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage(Definitions.scala:198)
    at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass$lzycompute(Definitions.scala:199)
    at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass(Definitions.scala:199)
    at scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr$lzycompute(Definitions.scala:1251)
    at scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr(Definitions.scala:1250)
    at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1408)
    at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1407)
    at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1450)
    at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1450)
    at scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1506)
    at scala.tools.nsc.Global$Run.<init>(Global.scala:1213)
    at scala.tools.nsc.interpreter.IMain.compileSourcesKeepingRun(IMain.scala:432)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.compileAndSaveRun(IMain.scala:814)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.compile(IMain.scala:772)
    at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:637)
    at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.$anonfun$start$1(ScalaInterpreterSpecific.scala:291)
    at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:206)
    at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.start(ScalaInterpreterSpecific.scala:282)
    at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific.start$(ScalaInterpreterSpecific.scala:266)
    at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.start(ScalaInterpreter.scala:43)
    at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.init(ScalaInterpreter.scala:94)
    at org.apache.toree.boot.layer.InterpreterManager.$anonfun$initializeInterpreters$1(InterpreterManager.scala:35)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:214)
    at org.apache.toree.boot.layer.InterpreterManager.initializeInterpreters(InterpreterManager.scala:34)
    at org.apache.toree.boot.layer.StandardComponentInitialization.initializeComponents(ComponentInitialization.scala:87)
    at org.apache.toree.boot.layer.StandardComponentInitialization.initializeComponents$(ComponentInitialization.scala:69)
    at org.apache.toree.Main$$anon$1.initializeComponents(Main.scala:35)
    at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:102)
    at org.apache.toree.Main$.delayedEndpoint$org$apache$toree$Main$1(Main.scala:35)
    at org.apache.toree.Main$delayedInit$body.apply(Main.scala:24)
    at scala.Function0.apply$mcV$sp(Function0.scala:39)
    at scala.Function0.apply$mcV$sp$(Function0.scala:39)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
    at scala.App.$anonfun$main$1$adapted(App.scala:80)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at scala.App.main(App.scala:80)
    at scala.App.main$(App.scala:78)
    at org.apache.toree.Main$.main(Mai

    Should I downgrade Java version? I use openjdk 11.0.13 2021-10-19
    WARN Main$$anon$1: No external magics provided to PluginManager!
    [init] error: error while loading Object, Missing dependency 'class scala.native in compiler mirror', required by /modules/java.base/java/lang/Object.class
    David M.
    ok, I have to downgrade Java 11 -> 8
    6 replies
    David M.
    I install Toree with 'jupyter toree install --spark_home=${SPARK_HOME} --interpreters=Scala,PySpark,SQL --python_exec=/opt/conda/bin/python3.8'
    However, in Jupyterlab I see only two kernels: Scala and SQL.
    Kindly ask how to debug. Thanks in advance.