Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    James
    @thetuxedo
    thanks a lot!
    James
    @thetuxedo
    @tovbinm regarding to the solution of loading workflowmodel without original workflow, do you have a rough timeline when it will be ready?
    James
    @thetuxedo
    and "keep an shared cache (memcached / redis etc) model id -> serialized version", does this apply for workflow and/or workflowmodel? thanks!
    Matthew Tovbin
    @tovbinm
    @thetuxedo you can serialize the local ScoreFunction using ObjectOutputStream and it should work without any need in workflow or model
    This message was deleted

    Howdy,

    TransmogrifAI 0.5.0 is out! Featuring support for XGBoost (experimental), Parquet readers, Spark 2.3.2 and numerous other improvements & fixes - https://github.com/salesforce/TransmogrifAI/releases/tag/0.5.0

    Thanks to all the contributors and users!!

    James
    @thetuxedo
    @tovbinm what i mean is can i serialize a workflow or a workflowmodel in cache using ObjectOutputStream?
    Matthew Tovbin
    @tovbinm
    @thetuxedo yes, this can work too. it should be easy to verify even in a unit test.
    James
    @thetuxedo
    @tovbinm I am not sure what is the purpose of OPMap? it is for predictor feature only? to map non-real value to real value? do you have example code how to use it?
    Matthew Tovbin
    @tovbinm
    OPMap types are useful if you would like to contain multiple values in a single cell.
    for instance, you have a custom schema with unknown number of fields, then you can group them into a map
    another example if you are aggregating your data and would like to keep several values keyed in a map
    for each of the types we have a corresponding map type, i.e. Real -> RealMap, Phone -> PhoneMap etc.
    say you want to write a transformer to detect languages in a text field
    so it will be something like this
    Matthew Tovbin
    @tovbinm
    class LanguageDetector extends UnaryTransformer[Text, RealMap]
    so the input is a Text field and the output would be a RealMap, which would contain detected languages with their confidences:
    RealMap(Map("en" -> 0.991, "fr" -> 0.91, "es" -> 0.702))
    dhanik
    @dhanik
    @tovbinm Is this possible to use trained model to predict something repeatedly outside spark in an "online mode"? Basically I would like to store the model and then be able to load in a java or scala app outside of spark and do some fast predictions. Thanks!
    Matthew Tovbin
    @tovbinm
    @dhanik yes, absolutely. You would need to use transmogrifai-local module for it. Once loaded, the model is usable to online scoring without need in Spark context - https://github.com/salesforce/TransmogrifAI/tree/master/local#transmogrifai-local
    Ravindra Kumar Meena
    @rmeena840
    Hi,
    My name is Ravindra Kumar Meena. I am undergraduate student pursuing B.Tech in Computer Science. My core interest lie in Operating System and Computer Networks.
    I will be applying for GSoC 2019. Please let me know where to get started.
    I have setup the development environment by following the steps mentioned at https://docs.transmogrif.ai/en/stable/installation/index.html
    How should I contribute here to showcase my skill here?
    Ravindra Kumar Meena
    @rmeena840
    Let me know the further steps for GSoC.
    Thank You
    Matthew Tovbin
    @tovbinm
    @rmeena840 thank you for reaching out. We will get back to you as soon as we start considering candidates.
    Vishal Gupta
    @py-ranoid
    @tovbinm I'd like to work on enabling import of TransmogrifAI models into Python or PySpark enabled environments for GSoC. Meanwhile could you comment on salesforce/TransmogrifAI#87 ?
    Ravindra Kumar Meena
    @rmeena840
    @tovbinm thanks for attending this query
    Is this is the same discussion forum for the following repo:
    ?
    Matthew Tovbin
    @tovbinm
    @rmeena840 no. Email to oss-gsoc@salesforce.com
    Matthew Tovbin
    @tovbinm
    @py-ranoid thank you for your interest. We have received your request and will be sending a formal response soon.
    In the meantime take your time playing around TransmogrifAI, then please carefully read https://google.github.io/gsocguides/student/ and see examples on writing a proposal - https://google.github.io/gsocguides/student/proposal-example-1. Cheers.
    Vishal Gupta
    @py-ranoid
    Thanks @tovbin. I did GSoC last year as a part of Debian, so I'm familiar with the concept of writing proposals.

    Also @tovbinm , can I raise a PR to link this video in the README ?
    https://databricks.com/session/implementing-automl-techniques-at-salesforce-scale

    It really helped me understand the need for TransmogrifAI better.

    Matthew Tovbin
    @tovbinm

    Thank you @py-ranoid we have all the recordings accessible from here - https://docs.transmogrif.ai/en/stable/talks/index.html

    Where in the Readme would be a place for it?

    Vishal Gupta
    @py-ranoid
    @tovbinm I think adding it before "Skip to Quick Start and Documentation" would be a good idea.
    Matthew Tovbin
    @tovbinm
    sure. make a PR and lets go from there. I also think that we should link to our blog post (it's mentioned here - https://docs.transmogrif.ai/en/stable/#motivation). It explains clearly why and how TransmogrifAI was built.
    btw, have you read the blog post?! ;) @py-ranoid
    Vishal Gupta
    @py-ranoid
    Sort of. I came across the post a while back but didn't read it completely.
    Vishal Gupta
    @py-ranoid
    @tovbinm @leahmcguire @ajayborra
    Regarding import of TransmogrifAI models into Python or PySpark enabled environments, are we looking at loading models with sklearn or pyspark.mllib? Also, would this involve replicating the feature selection and feature engineering pipelines into Python as well ?
    I plan on taking this up as a GSoC project and was wondering if building a Python wrapper for TransmogrifAI on the cards ?
    Matthew Tovbin
    @tovbinm
    @py-ranoid we haven't thought it really yet. The raw idea we had is to perhaps somehow allow loading a trained TransmogrifAI model from Python and allow exploring model insights, summary and also produce scores (not sure how latter is useful though).
    Another direction were thinking about is to allow using some of the TransmogrifAI methods (tokenize, autoBucketize, sanity checker etc) with PySpark, i.e. so that users would be able to code in Python with PySpark interface and access all the goodies of feature engineering, feature selection and model selection from Python. I guess you can call it PyTransmogrifAI ;)
    Matthew Tovbin
    @tovbinm
    @/all #TransmogrifAI 0.5.2 is out! A lot of improvements & bug fixes. Including local model runner with MLeap, max cardinality for pivot and @ProjectJupyter examples - https://github.com/salesforce/TransmogrifAI/releases/tag/0.5.2
    Matthew Tovbin
    @tovbinm
    Kudos to all the contributors!
    Matthew Tovbin
    @tovbinm
    TransmogrifAI 0.5.3 is out! Among the changes are data cutter bug fixes for multiclass, metadata fixes, new base test specs - https://github.com/salesforce/TransmogrifAI/releases/tag/0.5.3
    Thanks for contributions! @astrojaunty @warre_n_peace @gerashegalov @leahmmcguire @shubhanabar @chrisrupley
    captify-daleksandrov
    @captify-daleksandrov
    Hi! @tovbinm, what is the proper way of passing weightCol into model during training? I've prepared the labeled dataset with classWeight column, but this column seems to be lost on the early stages
    Matthew Tovbin
    @tovbinm
    /all #TransmogrifAI 0.6.1 is out! 🥳🎉
    Among the improvements are a new models combiner stage, mean & standard deviation for numeric features & text lengths, improved scaler metadata and more! - https://github.com/salesforce/TransmogrifAI/releases/tag/0.6.1 kudos to @gerashegalov
    @leahmmcguire and all contributors