Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Anca Sarb
    @ancasarb
    :thumbsup:
    Anca Sarb
    @ancasarb
    @tovbinm @hollinwilkins has just released mleap 0.14.0 which includes support for spark 2.4 and scala 2.12
    Noah Pritikin
    @cappaberra
    @ancasarb / @hollinwilkins thank you!!
    Matthew Tovbin
    @tovbinm
    yay!! Thank you!!! @ancasarb
    Noah Pritikin
    @cappaberra
    @ancasarb, I still don't see a 0.14.0 release here: https://github.com/combust/mleap/releases
    Matthew Tovbin
    @tovbinm
    @cappaberra it just wasnt tagged. See here - https://mvnrepository.com/artifact/ml.combust.mleap/mleap-spark
    Anca Sarb
    @ancasarb
    Yes I’ll do the tagging today
    Bill Lindsay
    @yazbread
    Does MLeap support loading a bundle packaged as a JAR, as opposed to ZIP. I'm trying it out and it is failing
    Noah Pritikin
    @cappaberra
    @ancasarb, thank you!
    Bill Lindsay
    @yazbread
    Does the implementation of linear regression require constant re-training? That is the opinion of one of our data scientists.
    Femi Anthony
    @femibyte

    Does anyone have any code that involves a workaround to the problem that mleap doesn't support SQLTransformer that they can share ? I've externalized the sql transformation as suggested here :
    combust/mleap#126

    so that I can serialize my model to mleap but I can't seem to get the same results
    when I compare the model I trained with the SparkML SQLTransformer vs doing the sql transformation on the training data before training the model and also on the input data for evaluation using the serialized mleap version.

    Noah Pritikin
    @cappaberra
    FYI for folks.... I wrapped up some performance testing on MLeap v0.14.0's spring boot interface regarding its support for multiple models. tl;dr, works great (if you can scale the # of cores appropriately). I had a test setup between 2 EC2 instances (one hosting MLeap and the other with the wrk open source software available to assist with the http load testing). Initially, I started with loading the same MLeap Pipeline into MLeap 20 times with a different model name. Then using an already constructed Leap Frame, I did a POST using the wrk command configured to slam MLeap with 3 concurrent "clients" across 3 threads generating a lot of transform requests. Using a c5.4xlarge instance (16 vCPUs), the latency was roughly: 10.71ms avg, 43ms 99th perc, 17ms stddev, 300ms max. This didn't cut it.... so, bumped up the instance type to c5.9xlarge (36 vCPUs), and I was able to bring the latency across the entire test down to something more expected: 7.8ms avg, 21ms 99th perc, 3.5ms stddev, 49ms max. BETTER! :) In addition, I didn't notice large Java heap memory usage either.... roughly the same throughout all the tests (sawtoothed as expected between 500MB and 2GB). One other interesting comparison is that I did the same testing with MLeap v0.10.1 about a year ago... with one model on v0.10.1, the latency looked like this: 4.36ms avg, 18.15ms 99th perc, 3.36ms stddev, 40.59ms max. On v0.14.0, with one model loaded doing the same load test I got: 2.49ms avg, 10.86ms 99th perc, 2.09ms stdddev, 53.71ms max). A slight performance bump with the newest version! w00t!
    Let me know if anyone has any questions
    Matthew Tovbin
    @tovbinm
    thanks for sharing @cappaberra
    Jun Wan
    @junwan01
    @fsinghoff I saw you were trying to make a bundle of Spark datapre pipeline plus TensorFlow model build, wondering if you have solve it. Thanks!
    It seems the MLeap + TF integration only supports scala binding, wonder if it has been extended to Python. Thanks!
    Jun Wan
    @junwan01
    Hi - I am interested in adopting MLeap for our tensorflow based training platform. Is there sample code on how to serialize a TF model to MLeap bundle in Python, combined with the pySpark data prep (transformation) pipeline proceeding to the training? I read the doc about using freeze_graph, but there are many arguments to the function and I am not sure how MLeap expect the arguments. In all, we want to save the transformation pipeline together with the model inference, so that MLeap inference runtime (serving REST api) can do both data transformation and inferencing and the data transformation matches the data prep at the training time to avoid drift. Thanks!
    Jun Wan
    @junwan01
    @hollinwilkins I am trying to get mleap to work with Tensorflow, and have lots of questions on serializing a TF model into MLeap bundle (in Python) and if the Mleap run time works with such saved hybrid pipeline. The doc is very light on this topic. I wonder if know of someone who can provide paid consultancy. Thanks!
    Hollin Wilkins
    @hollinwilkins
    @junwan01 the documentation there is pretty light. There is an example of using an image recognition pipeline in the test folder
    I would base any work off of that
    The other thing to know is that you need to freeze your graph into a single protobuf file
    Jun Wan
    @junwan01
    @hollinwilkins thanks for the reply. Could you please point out where the image recognition pipeline is? I can't find it under https://github.com/combust/mleap/.
    Jun Wan
    @junwan01
    Another question, wrt the 1-1 mapping of MLeap's transformers to Sparks. In the Python code examples, the model training code does not seem to refer to any of the mleap transformers, but rather the native spark.ml. My question is: are MLeap transoformers only used during model serving (MLeap runtime), not NOT training. How does the spark transformers know how to serialize into Bundle ML?
    Jun Wan
    @junwan01
    Hi - if anyone wants some consulting gig with mleap development, please let me know.
    Alice Lynch
    @alicelynch

    Hi, new user of mleap here!
    Is there a way to access individual stages of an ml pipeline?
    My use case is this: I train a spark pipeline with string indexer & model stages, which is then converted to mleap pipeline.
    At runtime I make predictions with this mleap pipeline, which returns for me a prediction for a given index. However I need to transform the index back into a human-readable string.
    In spark I could do something like

    val indexToString = pipelineModel.stages(0).asInstanceOf[StringIndexerModel].labels.zipWithIndex.map(el => (el._2, el._1)).toMap

    I couldn't find a solution in the docs, can someone point me in the right direction?

    mtsol
    @mtsol

    Hi all, is there any way of writing custom transformers for pyspark?

    Like, I want to do feature engineering inside a pyspark pipeline, so is it possible with mleap?

    ksyang
    @ksyang
    hi! basic starter question here. is it possible to use a newer version of MLeap than 0.8.1 in pyspark?
    the issue i'm running into is that ImputerModel doesn't seem to work with MLeap 0.8.1 in pyspark
    Bill Lindsay
    @yazbread
    I'm looking to run Mleap on Windows, using a Windows Batch file. not a docker image. Where can I see the options for mounting a local models directory. I assume once I do that, I use the alias to point to the Directory. I'm guessing something like c:/mymodels: /models. then when I say load a model I use file:/models/.... Can you tell me the correct what to set it up
    Bill Lindsay
    @yazbread
    @yazbread I'm running the Spring Boot App
    Bill Lindsay
    @yazbread
    Whats the correct way to map tpo a URI on windows "uri": "file:/models/v0/kpi_bundles/standard_scalar_bundle_unlockreqs.zip",
    Bill Lindsay
    @yazbread
    Nevermind, resolved, I have to use UnixPath
    Bill Lindsay
    @yazbread
    I'm running spring boot app, and change the port to NNNN, but when I run 2 instances, port 652327 is bound by the GRPC server. How can I set the port for the grpc server
    image.png
    Bill Lindsay
    @yazbread
    I mean 65327
    Bill Lindsay
    @yazbread
    I see it is 65327, 65328 HTT Port and GRPC ports respectively. Can I change these as runtime, like I can do with Spring Boot via -Dserver.port
    Juho Autio
    @juhoautio
    I'd like to implement this improvement: "StringMap to handle missing keys instead of throwing an exception".
    Could I get some preliminary nod from repo owners to know that PR like that would be welcome? Thanks!
    More at combust/mleap#555
    Anca Sarb
    @ancasarb
    hey @juhoautio just commented on the issue too, yes, sounds like a good plan! I added a small comment on the issue.
    Bill Lindsay
    @yazbread
    @ancasarb Hi Anca, When running the Spring Boot app, I want to change the Port #s for HTTP Port and GRPC Port at runtime. In order to run multiple instances of MLeap in Same Environment. How can I do that? Thanks
    Anca Sarb
    @ancasarb
    hey @yazbread taking a look, will come back shortly
    Anca Sarb
    @ancasarb
    @yazbread sorry for the delay, you can pass these two environment variables to the docker run command docker run -e MLEAP_GRPC_PORT=9091 -e server.port=9090 --name=mleap_serving combustml/mleap-serving:0.14.1-SNAPSHOT for example. we have a MLEAP_HTTP_PORT environment variable, but i realised now that’s ignored by the spring boot service, will raise a PR to fix that sometime soon. but for now, MLEAP_GRPC_PORT and server.port should do the job.
    learndeep
    @learndeep
    Hi Everyone, I m evaluating MLeap for serving my TensorFlow models vs TensorFlow Serving. Most concerning thing I found is "For right now, Tensorflow integration should be considered experimental as both Tensrflow and MLeap integration with Tensorflow are still stabilizing. (ref - http://mleap-docs.combust.ml/getting-started/tensorflow.html) ". Is it right TensorFlow integration is not stable?
    Anca Sarb
    @ancasarb
    Hey, no, we should remove that
    learndeep
    @learndeep
    Ok, Can u help with using MLeap for serving my TensorFlow models vs TensorFlow Serving
    learndeep
    @learndeep
    and one more thing.. Are we using Tensorflow JavaJNI in the backend (because it doesn't guarantee API stability)
    Juho Autio
    @juhoautio
    @ancasarb I noticed you have even committed to the aws/sagemaker-sparkml-serving-container repo (https://github.com/aws/sagemaker-sparkml-serving-container/commit/8181331a014350c7d868ae073ecdf07ec76b0642#diff-04c6e90faac2675aa89e2176d2eec7d8). I was wondering, do you think there would be a place for sagemaker somewhere in mleap docs, or should such info rather be found elsewhere? The sagemaker images provided by AWS are still with an old version of mleap & spark. But building from that aws repo seems to work for getting a newer mleap into use. Actually we even built from source to include custom changes.
    Luca Giovagnoli
    @lucagiovagnoli
    [sklearn MLeap] Hi, I wonder what we are supposed to put in mlinit's 'input_features' when serializing sklearn models. If I put a list of features, I get deserialization errors. A Node is supposed to have a single string there so how can I put multiple features ? I explain this better in this issue: combust/mleap#560
    chastise
    @chastise
    I know this is a long-shot, looking for ideas from folks who've got some experience using mleap in prod. I've got a java service with a single bundled pipeline that I'm now trying to expand to load several (distinct) bundles and use each of them to score similar data. Under load, the service abruptly falls over at reliable traffic rates due to what appears to be cpu/memory overhead as it's asking each model to score data (each request is one thread scoring one "row" and batching isn't an option at the moment). Increasing cpu/memory raises the traffic rate it can accommodate, but scaling that in prod is nontrivial and the single-bundle version could hit orders of magnitude higher load with no issue. Setting aside the possibility of dropping models or training combined models, has anyone encountered something like this and either identified a solution or at least gotten a clearer explanation on what might be stalling it? The GC goes wild right as the service falls over, and the only thing in the ML that might be GC-able is the leapframes after I've extracted scored data from a model.