Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    ॥ स्वक्ष॥
    Hi there.
    Is this a private or publicly logged channel? It would be nice if this were mentioned clearly somewhere.
    Rory Finnegan
    Public chat.
    If you want to chat privately you can always hover over someone and start a private chat with them.
    ॥ स्वक्ष॥
    Rory Finnegan
    So, far my thinking is to use the type hierarchy from the README to describe the requirements of different implementation types, but also provide wrappers of some kind so that people can use their existing implementations without having to subtype. On top of this, I'm thinking of using some kind of boxing abstraction like layers in Mocha.jl to name components and declare the input and output type and size. Thoughts?
    Douglas Bates
    One thing that I saw in the discussion of the github issue is (rephrasing here) to separate the fitting of models that use what statisticians call a "linear predictor" from the creation of the model matrix.
    When I wrote the initial version of GLM.jl and the formula stuff in DataFrames I was consciously emulating R, because that is what I knew well. I think we are at the point where the evaluation of a formula in a DataFrame to produce a ModelMatrix doesn't have to follow the R conventions. In particular I am coming around to the point of view that an implicit intercept should not be given. Those who complain vociferously that one must always have an intercept in a model can use another method of generating a model matrix. When it comes down to it, writing y ~ 1+x instead of y ~ x is not a lot to ask.
    Rory Finnegan
    Alright, I think I get your point, but should that be handled at this level? I'm just thinking that not all ML implementations need to be represented as a model matrix and for a lot of simple use cases the end user might not care how the model is represented as long as they can use it with other models. I could be completely missing what you're getting at.
    John Myles White
    I think Doug's point is that a major part of how R makes model fitting generic is a canonical mapping from DataFrames to Matrix{Float64} that all new functions can take advantage of when defining their API. Although there are models for which you need more than just a Matrix{Float64}, you can handle a lot of regression problems using that data representation. If you know that DataFrames will always map in the same way to Matrix{Float64}, then you can build most ML functionality without ever having to think about DataFrames again. So dealing with the ModelMatrix specification means that you've suddenly solved a big part of your API design problem.
    Rory Finnegan
    Yeah, that makes sense. This probably isn't the best solution, but here is a gist of what I've been doing when I need to convert between the two if folks are interested. https://gist.github.com/Rory-Finnegan/231c2478262833ea024f#file-mapping-jl
    Slight tangent: How much thought should be put into the mathematical structure of the API? For example, would the following property be desirable: train (xs ++ ys) = (train xs) * (train ys) for dataframes xs, ys?
    Rinu Boney

    Hi all,

    I'm interested in contributing to achieving the machine learning roadmap as part of JSoC 2015. I was wondering if anybody is willing to mentor me on this project. The deadline is June 1st. That's too soon and a quick response would be great. If anybody has the time and are willing to do the same then please contact me(rinuboney@gmail.com) asap. I'm not a Julia expert but I know machine learning and I believe I can I can do this.