Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    I have uploaded the "JustEnoughScalaForSpark.snb" from the GitHub location, onto the Notebook. However, I don't see it added to the notebooks list. I chose "click here" to upload the notebook. Any help?
    for people having issues seeing the notebook when using Docker, such as @EinserViech_twitter: you might need to refer to the IP for your docker-machine, e.g. rather than localhost:9001
    In addition, if you're using docker, the files will be downloaded inside the docker container instead of locally to your computer. you can see them by executing bash inside the running container. do this in a separate terminal: docker exec -it <id of your container> /bin/bash - you'll get a command prompt. then ls and you should see the data/shakespeare directory
    Hi - I have a question to scala and Spark, not sure if this is the right forum. If not please direct me to the correct forum. In the Scala class now we talked about Dataframes and how to use them with Scala. I would like to know how I can execute Spark SQL queries in parallel in a Spark Streaming application. Should I use Scala Futures to submit each dataframe aggregation and will those be executed concurrently?
    *typo - I have a question regarding Scala and Spark
    Yu Shen
    I'm having the same problem as @mannit . I'm using Ubuntu 16.04, with java SDK:
    java -version
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
    no docker
    Following the instruction below:

    The click here is a link. Click it, then navigate to where you downloaded the tutorial GitHub repository. Find and select notebooks/JustEnoughScalaForSpark.snb.

    A new line in the UI is added with "JustEnoughScalaForSpark.snb" and an "Upload" button on the right-hand side, as shown in Figure 1:

    This step produced the expected outcome.
    The next:

    Figure 1: Before Uploading the Notebook
    I've highlighted the "click here" link that you used and the new line that was added for the tutorial notebook.

    Click the "Upload" button.

    Now the line is moved towards the bottom of the page and the buttons on the right-hand side are different.

    This step failed to make the notebook appearing at the bottom of the page.
    I then also tried the alternative:
    Yu Shen
    I found there were error messages:

    Play server process ID is 2435
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/yubrshen/programming/scala/spark-notebook-0.7.0-scala-2.11.8-spark-2.1.0-hadoop-2.7.2-with-hive/lib/ch.qos.logback.logback-classic-1.1.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/yubrshen/programming/scala/spark-notebook-0.7.0-scala-2.11.8-spark-2.1.0-hadoop-2.7.2-with-hive/lib/org.slf4j.slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
    [info] play - Application started (Prod)
    [info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001
    [DEBUG] [06/07/2017 22:36:04.809] [New I/O worker #1] [EventStream] StandardOutLogger started
    [DEBUG] [06/07/2017 22:36:04.955] [New I/O worker #1] [EventStream(akka://NotebookServer)] logger log1-Slf4jLogger started
    [DEBUG] [06/07/2017 22:36:04.956] [New I/O worker #1] [EventStream(akka://NotebookServer)] Default Loggers started
    [debug] application - Notebooks directory in the config is referring ./notebooks. Does it exist? false
    [info] application - Notebooks dir is ../notebooks [at /home/yubrshen/programming/scala/spark-notebook-0.7.0-scala-2.11.8-spark-2.1.0-hadoop-2.7.2-with-hive/../notebooks]
    [info] application - Notebook directory is: /home/yubrshen/programming/scala/notebooks
    [debug] application - Profiles file is : ../conf/profiles
    [debug] application - Clusters file is : ../conf/clusters
    [error] a.a.OneForOneStrategy - ../conf/profiles (No such file or directory)
    akka.actor.ActorInitializationException: exception during creation
    at akka.actor.ActorInitializationException$.apply(Actor.scala:166) ~[com.typesafe.akka.akka-actor_2.11-2.3.11.jar:na]
    at akka.actor.ActorCell.create(ActorCell.scala:596) ~[com.typesafe.akka.akka-actor_2.11-2.3.11.jar:na]
    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) ~[com.typesafe.akka.akka-actor_2.11-2.3.11.jar:na]
    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) ~[com.typesafe.akka.akka-actor_2.11-2.3.11.jar:na]
    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) ~[com.typesafe.akka.akka-actor_2.11-2.3.11.jar:na]
    Caused by: java.io.FileNotFoundException: ../conf/profiles (No such file or directory)
    at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_131]
    at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_131]
    at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_131]
    at scala.io.Source$.fromFile(Source.scala:91) ~[org.scala-lang.scala-library-2.11.8.jar:na]
    at scala.io.Source$.fromFile(Source.scala:76) ~[org.scala-lang.scala-library-2.11.8.jar:na]
    [debug] application - DASH → /
    [error] application -

    ! @749h2j09d - Internal server error, for (GET) [/profiles?_=1496900165975] ->

    play.api.Application$$anon$1: Execution exception[[AskTimeoutException: Recipient[Actor[akka://NotebookServer/user/$a#1733845533]] had already been terminated.]]
    at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.11-2.3.10.jar:2.3.10]

    @yubrshen - Can you try restarting your notebook and doing all the above steps "Upload" etc and then going to the local link manually in the browser: http://localhost:9001/notebooks/JustEnoughScalaForSpark.snb#
    Give it a couple of mins.. it should refresh.. it worked for me finally when I manually typed in the url above
    Yu Shen
    I observed that there was not a single notebook shown at the bottom, not even those at the notebooks directory of the Spark Notebook distribution
    @mannit you meant restart the notebook server then use the explicit url to load the notebook after the manual "Upload"?
    kind of hacky, but that worked..
    Yu Shen
    Thanks! It worked by simply clicking the url even without restarting nor re-doing "upload".
    So for those runing into the issue of not seeing uploaded notebook, just use explicit url to load it.
    Great! :)
    Dean Wampler
    Sorry for the difficults several of you have had. Spark Notebook is not the best experience. Most the exceptions you noted @yubrshen don’t actually cause problems, but they shouldn’t occur anyway.
    @mannit, check out the Spark with Scala group, https://gitter.im/spark-scala/Lobby
    I am trying to install the Spark Notebook on my Mac and running into problems. My initial issue when running 'bin/spark-notebook', I received a permission error. My solution was to change the permissions of the file with a 'chmod 711'. Upon opening the localhost:9001 link, I received errors and have not found a fix. I am running Java 8, on OS Sierra.
    Dean Wampler
    Does the terminal output show the URL with port 9001? If it’s 9000 (some versions of Spark Notebook), you have to use that port.
    If that’s not the issue, can you provide more information about the errors you’re seeing
    User error, my apologies. As it turns out, my Java 8 was not recognized in my command terminal.
    Dean Wampler
    Glad you figured it out.
    I resorted to using homebrew to install it instead. Just got notebook loaded, thanks!
    Dean Wampler
    Good luck!
    @deanwampler I am working on implementing Lambda Architecture with Spark 2.0, Scala 2.11.8, Cassandra and Kafka and wanted to know if there are any recommended links to look into especially for implementing the reconciliation layer between streaming and batch. Can you please suggest some?
    Dean Wampler
    I don’t have any recommendations, other than the fact you can share code between batch and streaming. You might ask the https://gitter.im/spark-scala/Lobby channel
    I am trying to run wordFileNameOnes.count but it's throwing the exception...
    Caused by: java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
    I am running this in spark-shell launched from windows.. is something wrong with the regex split("""\W+""") given in the code when running from windows?
    Dean Wampler
    That shouldn’t happen even on Windows. Can you provide a stack trace? Did you make any modifications that you think might have caused the issue?
    hi Dean.. there was issue with the separator which was being used to get the file name from the full file path... it's resolved now and getting results as expected...
    Fernando Margueirat
    I see there has not been any new messages in almost a year, but I hope someone still checks this from time to time. I am running the Jupyter notebook as per https://github.com/deanwampler/JustEnoughScalaForSpark but on the cell's output it does not print the types of the variables and it is sometimes hard to follow the examples without that information.
    i.e. instead of printing
    (Int, Int) = (1,2)
    it prints
    Does anyone knows if this is some kind of setting I can change or it is due to me having a different version of the Docker container?
    Fernando Margueirat
    I double checked that I ran %showTypes
    Fernando Margueirat
    Never mind, apparently there's a bug https://issues.apache.org/jira/browse/TOREE-467

    Dear everyone,
    I am trying to generate uber-jar using sbt complie and sbt package commands for running my application on our remote server with spark installed as standalone mode there. I used deeplearning4j framework for building LSTM neural network and tend to perform traing model through spark. Nevertheless, I got into issue when running spark-submit command:

    spark-submit --class "lstm.SparkLSTM" --master local[*] stock_prediction_scala_2.11-0.1.jar --packages org.deeplearning4j:deeplearning4j-core:0.9.1 "/home/hadoop/ScalaWorkspace/Stock_Prediction_Scala/target/lstm_train/prices-split-adjusted.csv" "WLTW"

    The problem is that seemly spark-submit did not take effect in my circumstance. It has been done right after entering spark-submit withou throwing any error. I have seen the progress of training in the ouput.

    [hadoop@gaion34 lstm_train]$ spark-submit --class "lstm.SparkLSTM" --master local[*] stock_prediction_scala_2.11-0.1.jar --packages org.deeplearning4j:deeplearning4j-core:0.9.1 "/home/hadoop/ScalaWorkspace/Stock_Prediction_Scala/target/lstm_train/prices-split-adjusted.csv" "WLTW"
    2018-04-25 17:06:50 WARN  Utils:66 - Your hostname, gaion34 resolves to a loopback address:; using instead (on interface eno1)
    2018-04-25 17:06:50 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
    2018-04-25 17:06:51 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2018-04-25 17:06:51 INFO  ShutdownHookManager:54 - Shutdown hook called
    2018-04-25 17:06:51 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-c4aee15e-d23b-4c03-95a7-12d9d39f714a
    [hadoop@abc lstm_train]$ spark-submit --class "lstm.SparkLSTM" --master local[*] stock_prediction_scala_2.11-0.1.jar --packages org.deeplearning4j:deeplearning4j-nn:0.9.1 "/home/hadoop/ScalaWorkspace/Stock_Prediction_Scala/target/lstm_train/prices-split-adjusted.csv" "WLTW"
    2018-04-25 17:07:12 WARN  Utils:66 - Your hostname, abcresolves to a loopback address:; using instead (on interface eno1)
    2018-04-25 17:07:12 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
    2018-04-25 17:07:13 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2018-04-25 17:07:13 INFO  ShutdownHookManager:54 - Shutdown hook called
    2018-04-25 17:07:13 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-82fdaebf-1121-4e31-8c4f-37aea9683922

    my main class:

    object SparkLSTM {
      def main(args: Array[String])= {
        if (args.length == 2) {
          val filePath = args(0)    //"/Users/kym1992/STUDY/NEU/CSYE7200/Dataset/nyse/prices-split-adjusted.csv"
          val symbolName = args(1)
          val prepared = StockPricePredictionLSTM.prepare(filePath, symbolName, 0.90)
          val result = StockPricePredictionLSTM.predictPriceOneAhead(prepared._1, prepared._2, prepared._3, prepared._4, prepared._5)
          println("predicts, actual")
          (result.predicts, result.actuals).zipped.foreach((x, y) => println(x + ", " + y))
          saveAsCsv(result, symbolName)
          result.predicts.foreach(r => println(r))

    Any one has experienced this issue before, please advise me . thanks

    I have not seen the progress of training in the ouput.
    Dean Wampler
    Hi, @rickyhai11. I see you asked on the Spark with Scala channel. That's a better place for general questions not specific to the "Just Enough..." tutorial. Good luck!
    How does spark create external table works ?
    Does it create external location with table ?
    Please help me on this
    Dean Wampler
    It should, but I haven't tried it in a while. Try the https://gitter.im/spark-scala/Lobby channel, if you have problems. It has a lot more people participating.