Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Noah Pritikin
    @cappaberra
    @guzzijones, are you creating your EMR cluster via the web console, CLI, or some other means?
    AJ
    @guzzijones
    Via python boto3
    Noah Pritikin
    @cappaberra
    @guzzijones, you want to add Configuration to your run_job_flow() call.
    https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.run_job_flow
    more on that part of the function's parameter is here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
    {"classification":"spark-defaults","properties":{"spark.jars.packages":"ml.combust.mleap:mleap-spark_2.11:0.15.0"}}
    :point_up: that's what you might add via the AWS console when configuring a cluster with support for MLeap v0.15.0. I don't have an example that uses boto3's run_job_flow function, but you can extrapolate that Spark EMR configuration to use in the python code.
    AJ
    @guzzijones
    Good god you are a saint
    Please post this in the issue on git
    Noah Pritikin
    @cappaberra
    can you reference that here?
    AJ
    @guzzijones
    On mobile... one sec
    Noah Pritikin
    @cappaberra
    And, you're welcome! :)
    AJ
    @guzzijones
    Noah Pritikin
    @cappaberra
    @guzzijones, your question seems more in line with this issue: combust/mleap#343 .... the other one is old-ish and not super focused on one question.
    or, if you create a new GitHub issue with your question, I can post a response to it. Let me know...
    AJ
    @guzzijones
    Will do.
    Noah Pritikin
    @cappaberra
    @guzzijones, just posted. Hopefully it can help more people! Thanks for posting the question!
    AJ
    @guzzijones
    Baller. So many people will be so happy
    @cappaberra we are wroking on an django application that trains models in aws emr then downloads them locally to do the classification.
    Mleap solves the download problem and persistence problem. Now we can store classifiers on disk locally
    Noah Pritikin
    @cappaberra
    @guzzijones, nice!
    AJ
    @guzzijones
    I posted a new question on the mleap repo. Thanks for your help with the missing jar files before.
    I have a custom transformer that I use in pyspark.
    I am getting no method to_java
    AJ
    @guzzijones
    I posted an update to my question. looks like I am missing the jar files for JavaTransformer?
    I am not sure how to find out what specific maven package I need.
    karthik-bs
    @karthik-bs
    Does MLLeap support Spark Transformation on the raw data before running it through a predictor ?
    I have a use case where i compute average, lag on a dataframe before passing it as a input to the model
    AJ
    @guzzijones
    Isnt that just a pipeline?
    With a custom transformer
    karthik-bs
    @karthik-bs
    I would be reading a large json blob of data from a database . So would i essentially convert this into a leap frame and pass it to the pipeline ?
    I would need to do some aggregation, compute lag in the data and then implement a rule based algorithm that would score a yes or no based on the rule.
    This is my present use case, Eventually this rule based method would go away and be replaced by an ML classifier
    Any pointers on how i could leverage MLLeap to implement this functionality
    Bill Lindsay
    @yazbread
    For Model Management, loading or unloading a Model. The timeout settings are in seconds? Also, The Spring Boot app returns 202 when loading the model, even if it didnt get loaded. I need to do a GET call to see it if it is actually there. I pass in a Integer.MAX_VALUE for both disk and memory timeout. How can I see if the model got unloaded due to Timeout?
    Felix Gao
    @gaotangfeifei_twitter

    Hi, I am new to mleap and trying the airbnb example. I have encountered the following errors

    ERROR:root:Exception while sending command.
    Traceback (most recent call last):
      File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
        raise Py4JNetworkError("Answer from Java side is empty")
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
        response = connection.send_command(command)
      File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
        "Error while receiving", e, proto.ERROR_ON_RECEIVE)
    py4j.protocol.Py4JNetworkError: Error while receiving

    I am using Spark 2.4.4 and I have installed mleap using spark-defaults.conf

    spark.jars.packages  org.apache.spark:spark-avro_2.11:2.4.4,ml.combust.mleap:mleap-spark_2.11:0.15.0

    My terminal is showing exception wtih NoClassDefFoundError

    Exception in thread "Thread-4" java.lang.NoClassDefFoundError: ml/combust/bundle/serializer/SerializationFormat
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
        at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
        at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
        at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
        at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.ClassNotFoundException: ml.combust.bundle.serializer.SerializationFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 9 more
    I do think I have installed the dependencies correctly
    Ivy Default Cache set to: /Users/ggao/.ivy2/cache
    The jars for the packages stored in: /Users/ggao/.ivy2/jars
    :: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark/2.4.4/libexec/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
    org.apache.spark#spark-avro_2.11 added as a dependency
    ml.combust.mleap#mleap-spark_2.11 added as a dependency
    :: resolving dependencies :: org.apache.spark#spark-submit-parent-dbeefc3f-8e12-443d-8629-8adf19670d42;1.0
        confs: [default]
        found org.apache.spark#spark-avro_2.11;2.4.4 in central
        found org.spark-project.spark#unused;1.0.0 in local-m2-cache
        found ml.combust.mleap#mleap-spark_2.11;0.15.0 in central
        found ml.combust.mleap#mleap-spark-base_2.11;0.15.0 in central
        found ml.combust.mleap#mleap-runtime_2.11;0.15.0 in central
        found ml.combust.mleap#mleap-core_2.11;0.15.0 in central
        found ml.combust.mleap#mleap-base_2.11;0.15.0 in central
        found ml.combust.mleap#mleap-tensor_2.11;0.15.0 in central
        found io.spray#spray-json_2.11;1.3.2 in central
        found com.github.rwl#jtransforms;2.4.0 in central
        found ml.combust.bundle#bundle-ml_2.11;0.15.0 in central
        found com.google.protobuf#protobuf-java;3.5.1 in central
        found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in local-m2-cache
        found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in local-m2-cache
        found com.lihaoyi#fastparse_2.11;1.0.0 in local-m2-cache
        found com.lihaoyi#fastparse-utils_2.11;1.0.0 in local-m2-cache
        found com.lihaoyi#sourcecode_2.11;0.1.4 in local-m2-cache
        found com.jsuereth#scala-arm_2.11;2.0 in central
        found com.typesafe#config;1.3.0 in local-m2-cache
        found commons-io#commons-io;2.5 in local-m2-cache
        found org.scala-lang#scala-reflect;2.11.8 in local-m2-cache
        found ml.combust.bundle#bundle-hdfs_2.11;0.15.0 in central
    :: resolution report :: resolve 547ms :: artifacts dl 16ms
        :: modules in use:
        com.github.rwl#jtransforms;2.4.0 from central in [default]
        com.google.protobuf#protobuf-java;3.5.1 from central in [default]
        com.jsuereth#scala-arm_2.11;2.0 from central in [default]
        com.lihaoyi#fastparse-utils_2.11;1.0.0 from local-m2-cache in [default]
        com.lihaoyi#fastparse_2.11;1.0.0 from local-m2-cache in [default]
        com.lihaoyi#sourcecode_2.11;0.1.4 from local-m2-cache in [default]
        com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from local-m2-cache in [default]
        com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from local-m2-cache in [default]
        com.typesafe#config;1.3.0 from local-m2-cache in [default]
        commons-io#commons-io;2.5 from local-m2-cache in [default]
        io.spray#spray-json_2.11;1.3.2 from central in [default]
        ml.combust.bundle#bundle-hdfs_2.11;0.15.0 from central in [default]
        ml.combust.bundle#bundle-ml_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-base_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-core_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-runtime_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-spark-base_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-spark_2.11;0.15.0 from central in [default]
        ml.combust.mleap#mleap-tensor_2.11;0.15.0 from central in [default]
        org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
        org.scala-lang#scala-reflect;2.11.8 from local-m2-cache in [default]
        org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
        :: evicted modules:
        com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   23  |   0   |   0   |   1   ||   22  |   0   |
        ---------------------------------------------------------------------
    ...
        confs: [default]
        0 artifacts copied, 22 already retrieved (0kB/15ms)
    Akarsh Gupta
    @akarsh3007
    Hi Everyone, Has anyone seen this problem with XGB serving where the Predictions in Spark and MLEAP serving is different? I am using MLeap Version 0.11
    Luca Giovagnoli
    @lucagiovagnoli
    @akarsh3007 what transformers are you using? Is it similar to this combust/mleap#596 ?
    王伟
    @woneway
    I cannot find the file "bundle.json" in my model contains one custom transformer, who knows about it?
    image.png
    the files in zip are like this
    Ganesh Krishnan
    @ganeshkrishnan1
    does MEAP support spark LDA? I can see combust/mleap#144 with LDA support but neither the documentation nor our code seems to work
    Luca Giovagnoli
    @lucagiovagnoli
    hi @ancasarb, do you know if MLeap Runtime is thread-safe? I cannot see many ‘synchronized’ functions in the codebase https://github.com/combust/mleap/search?l=Scala&q=synchronized so I assume it’s not. I wonder if there’s been any clear reports of it being non-thread-safe.
    Anca Sarb
    @ancasarb
    Hi @lucagiovagnoli, do you mean things like the FrameReader(s), RowTransformer, Transformer, FrameWriter(s) etc?
    If so, then yes, they’re thread safe. There’s no need for synchronization, most are stateless.
    We have been having all these beans wired as singleton beans (if you’re familiar with Spring framework in Java) without any issues for 3+ years.
    Luca Giovagnoli
    @lucagiovagnoli
    @ancasarb thanks so much for sharing your valued experience. I’m not familiar with beans but I’m going to read up about it now :)
    Transformer and RowTransformer is what we’re using, so that sounds great!
    Daniel Hen
    @Daniel8hen
    Hi all, I wanted to ask a junior question :)
    I have a Spark model (XGBoost4J), already serialized in the famous MLeap bundle json. Now I'd like to deploy it to some service on docker / Kubernetes and start querying it. My question is where do I put the parameters that shall be relevant to each request? if I have let's say 1000 features, and only 500 of them are relevant, how should I tackle this use case? Where should I start? the documentation is not that clear about this use case. Thank you!
    prafulrana21
    @prafulrana21

    Hi, I am trying to create a single executable jar of mleap spring-boot application for development.
    And Tried these steps:

    1. sbt compile
    2. sbt mleap-spring-boot/package
    3. scala mleap-spring-boot_2.11-0.16.0-SNAPSHOT.jar
      but getting this error while execiting this jar file
      ```java.lang.ClassNotFoundException: scala.App$class
       at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
       at ml.combust.mleap.springboot.Starter$.<init>(Starter.scala:3)
       at ml.combust.mleap.springboot.Starter$.<clinit>(Starter.scala)
       at ml.combust.mleap.springboot.Starter.main(Starter.scala)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at scala.reflect.internal.util.ScalaClassLoader.$anonfun$run$2(ScalaClassLoader.scala:105)
       at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:40)
       at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
       at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:130)
       at scala.reflect.internal.util.ScalaClassLoader.run(ScalaClassLoader.scala:105)
       at scala.reflect.internal.util.ScalaClassLoader.run$(ScalaClassLoader.scala:97)
       at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:130)
       at scala.tools.nsc.CommonRunner.run(ObjectRunner.scala:29)
       at scala.tools.nsc.CommonRunner.run$(ObjectRunner.scala:28)
       at scala.tools.nsc.JarRunner$.run(MainGenericRunner.scala:17)
       at scala.tools.nsc.CommonRunner.runAndCatch(ObjectRunner.scala:35)
       at scala.tools.nsc.CommonRunner.runAndCatch$(ObjectRunner.scala:34)
       at scala.tools.nsc.JarRunner$.runJar(MainGenericRunner.scala:17)
       at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:76)
       at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:91)
       at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:103)
       at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:108)
       at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

    ```