Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    cresny
    @cresny
    good point, let me try my latest test with spark-sumit
    cresny
    @cresny
    yeah, works with local[1]
    Ignacio
    @ghoto

    Hi, I have some questions regarding the docker images with Spark 2.4 and Scala 2.12. I need to use 2.12 because of some other dependencies, and 2.11 is not supported. I tried different versions of the docker image (polynote:0.4.11-2.12-spark2.4, polynote:0.3.12-2.12-spark2.4, and polynote:0.3.11-2.12-spark2.4) but I can't execute anything. The kernel can't be started and dies with NoSuchMethodError scala.Function1.$init$(Lscala/Function1;)V (smells of wrong scala).

    I went inside the container to execute spark-submit --version and it seems it is compiled against Scala 2.11.

    polly@50d1c6882048:/opt$ spark-submit --version
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.5
          /_/
    
    Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_282
    Branch HEAD
    Compiled by user centos on 2020-02-02T19:38:06Z
    Revision cee4ecbb16917fa85f02c635925e2687400aa56b
    Url https://gitbox.apache.org/repos/asf/spark.git
    Type --help for more information.
    jonathanindig
    @jonathanindig
    you’re right - looks like there’s something wrong with the way we’re building the images :(
    Ignacio
    @ghoto
    I'm trying to build the image passing the variable, but it never reaches the script install_spark.sh.. refreshing my Docker skills right now
    Ignacio
    @ghoto
    docker build --no-cache --tag custom-polynote --build-arg SCALA_VERSION=2.12 --build-arg POLYNOTE_VERSION=0.4.0 .
    [+] Building 8.8s (7/8)
     => [internal] load build definition from Dockerfile                                                                                                                                                   0.0s
     => => transferring dockerfile: 43B                                                                                                                                                                    0.0s
     => [internal] load .dockerignore                                                                                                                                                                      0.0s
     => => transferring context: 2B                                                                                                                                                                        0.0s
     => [internal] load metadata for docker.io/polynote/polynote:0.4.0-2.12                                                                                                                                0.0s
     => [1/4] FROM docker.io/polynote/polynote:0.4.0-2.12                                                                                                                                                  0.0s
     => [internal] load build context                                                                                                                                                                      0.0s
     => => transferring context: 38B                                                                                                                                                                       0.0s
     => CACHED [2/4] WORKDIR /opt                                                                                                                                                                          0.0s
     => [3/4] COPY install_spark.sh .                                                                                                                                                                      0.0s
     => [4/4] RUN ./install_spark.sh && rm ./install_spark.sh                                                                                                                                              8.6s
     => => # + SPARK_VERSION_DIR=spark-2.4.5
     => => # /opt /opt
     => => # + test '' = 2.12
    final line makes reference to these lines in the script install_spark.sh
    d441a2624       (Jeremy Smith   2020-02-19 13:38:11 -0800       9)if test "${SCALA_VERSION}" = "2.12"
    d441a2624       (Jeremy Smith   2020-02-19 13:38:11 -0800       10)then
    Ignacio
    @ghoto
    Did some research and seems there is Docker problem. ARGs declared before a FROM are reset. This seems to be expected. moby/moby#34129 and docker/cli#333
    jonathanindig
    @jonathanindig
    Wow! thanks for finding this out!
    what a pain :(
    So I guess we need to redeclare the args after the FROM?
    Ignacio
    @ghoto
    I think I solved it. It's just adding one line to the docker file
    yes
    before and after
    all the images in docker hub are broken BTW
    jonathanindig
    @jonathanindig
    Awesome, if you submit a PR I will happily accept!
    Ignacio
    @ghoto
    :thumbsup:
    Ignacio
    @ghoto
    @jonathanindig polynote/polynote#1170
    cresny
    @cresny
    @jonathanindig quick update on running 0.4.0 on Spark 3.1.1 cluster. I was only able to get client mode to work after a couple of small changes. First, I swapped scala from 2.12.12 -> 2.12.10 to match the Spark dist version. That took care of the serialization incompatibility that @bradfrosty mentioned above. After this I was getting classloader ClassCastException issues. For some reason, dropping org.slf4j from the server-assembly fixed that.
    Ignacio
    @ghoto
    Need help debugging this error when the kernel is starting. I'm running Polynote from a docker container.
    [ERROR]  Kernel closed with error (Logged from KernelPublisher.scala:81)
       |     java.io.IOException: Connection reset by peer
       |     sun.nio.ch.FileDispatcherImpl.read0(Native Method)
       |     sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
       |     sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
       |     sun.nio.ch.IOUtil.read(IOUtil.java:197)
       |     sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
       |     polynote.kernel.remote.SocketTransport$FramedSocket.readBuffer(transport.scala:489)
       |     polynote.kernel.remote.SocketTransport$FramedSocket.$anonfun$read$1(transport.scala:510)
       |     zio.internal.FiberContext.evaluateNow(FiberContext.scala:490)
       |     zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:778)
       |     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       |     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       |     java.lang.Thread.run(Thread.java:748)
       |     Fiber failed.
       |     A checked error was not handled.
       |     java.io.IOException: Connection reset by peer
       |         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
       |         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
       |         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
       |         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
       |         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
       |         at polynote.kernel.remote.SocketTransport$FramedSocket.readBuffer(transport.scala:489)
       |         at polynote.kernel.remote.SocketTransport$FramedSocket.$anonfun$read$1(transport.scala:510)
       |         at zio.internal.FiberContext.evaluateNow(FiberContext.scala:490)
       |         at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:778)
       |         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       |         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    
    ....
    jonathanindig
    @jonathanindig
    Anything else in the logs? Seems like the kernel died
    Ignacio
    @ghoto
    Which logs should I look at?
    jonathanindig
    @jonathanindig
    The same logs, just earlier. Can you share the entire output?
    Ignacio
    @ghoto
    It's pretty big, let me see if I can paste it somewhere..
    jonathanindig
    @jonathanindig
    Maybe it OOMed?
    It looks like Spark started fine and the kernel just died, which is why I ask
    Ignacio
    @ghoto
    I don't think so.. this is at the start of the first cell. This happens when I declare some libraries that work just fine in scala, but they make spark to crash after I read a small json file into a dataframe.
    I was wondering where else to look, because this stacktrace doesn't give me any clue.
    (the libraries that I'm using are from Akka)
    jonathanindig
    @jonathanindig
    does the same code work from spark-shell or spark-submit?
    Ignacio
    @ghoto
    I created an object and running it from main. It runs without issues
    jonathanindig
    @jonathanindig
    Within Spark?
    Ignacio
    @ghoto
    yes, the code runs within spark (akka code), but then when I try to read from a json file (spark.read.json) the kernel dies.
    the error arises even before I run akka code, by just declaring the dependencies the kernel dies when I try to read json.
    The json file can be read if I create a new notebook without the dependencies
    jonathanindig
    @jonathanindig
    Very strange. Can you read the json if you start a spark-shell with those dependencies?
    Ignacio
    @ghoto
    I need to try.
    Ignacio
    @ghoto

    it works just fine in spark-shell

    scala> val df = spark.read.json("polynote/data/04-2021-aws-dns.json")
    df: org.apache.spark.sql.DataFrame = [account_id: string, answers: array<struct<Class:string,Rdata:string,Type:string>> ... 12 more fields]

    with

    spark-shell --packages com.typesafe.akka:akka-actor-typed_2.12:2.6.14,com.typesafe.akka:akka-http_2.12:10.2.4,com.typesafe.akka:akka-stream_2.12:2.6.14,com.typesafe.akka:akka-http-spray-json_2.12:10.2.4
    jonathanindig
    @jonathanindig
    hmm, very strange.
    Jonathan Ward
    @jbward1
    Looking at your Contributing doc. I was interested in possibly contributing to the project. Did you have a list of priority issues/features you're looking to ship?
    Benjamin Cabalona Jr.
    @benjcabalona1029
    If I install polynote on a spark cluster will it automatically use worker nodes as executors or is there extra set up i need to do?
    jeremyrsmith
    @jeremyrsmith
    @benjcabalona1029 For Spark kernels, polynote just uses spark-submit. So if the machine running polynote can spark-submit things and that works as expected, then polynote's kernel should also work as expected. (except for some weirdness around cluster mode vs. client mode – something we're looking at)
    (if you spark-submit with --mode cluster then polynote will usually get confused and think of the kernel as having died. There's an open issue for that. But otherwise it should work fine)
    Benjamin Cabalona Jr.
    @benjcabalona1029
    Okay thanks! Appreciate it.
    Benjamin Cabalona Jr.
    @benjcabalona1029
    Will start my cluster set up tomorrow. My use case for polynote in prod is mainly for data validation. I will query the source data with jdbc, and compare it to the processed stored in Hive.
    Polynote will run on the same cluster where we run the ETL job.
    Does anyone have something similar? (I mentioned prod but I meant to say test environment)
    larsskaug
    @larsskaug
    I'm unable to get the required JEP install to work in EC2. Any tips on RHEL 7.9 ? The error I get points to the Python installation, but I've been down a rabbit hole trying to address them: /usr/bin/ld: cannot find -lpython3.6m
    jonathanindig
    @jonathanindig
    @larsskaug Have you looked at Jep github issues? such as ninia/jep#283 or ninia/jep#220 ?
    @jbward1 Any help would be appreciated! We don’t have a roadmap as such, feel free to go through our open issues :)