- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

Is this a private or publicly logged channel? It would be nice if this were mentioned clearly somewhere.

If you want to chat privately you can always hover over someone and start a private chat with them.

So, far my thinking is to use the type hierarchy from the README to describe the requirements of different implementation types, but also provide wrappers of some kind so that people can use their existing implementations without having to subtype. On top of this, I'm thinking of using some kind of boxing abstraction like layers in Mocha.jl to name components and declare the input and output type and size. Thoughts?

When I wrote the initial version of GLM.jl and the formula stuff in DataFrames I was consciously emulating R, because that is what I knew well. I think we are at the point where the evaluation of a formula in a DataFrame to produce a ModelMatrix doesn't have to follow the R conventions. In particular I am coming around to the point of view that an implicit intercept should not be given. Those who complain vociferously that one must always have an intercept in a model can use another method of generating a model matrix. When it comes down to it, writing y ~ 1+x instead of y ~ x is not a lot to ask.

Alright, I think I get your point, but should that be handled at this level? I'm just thinking that not all ML implementations need to be represented as a model matrix and for a lot of simple use cases the end user might not care how the model is represented as long as they can use it with other models. I could be completely missing what you're getting at.

I think Doug's point is that a major part of how R makes model fitting generic is a canonical mapping from DataFrames to Matrix{Float64} that all new functions can take advantage of when defining their API. Although there are models for which you need more than just a Matrix{Float64}, you can handle a lot of regression problems using that data representation. If you know that DataFrames will always map in the same way to Matrix{Float64}, then you can build most ML functionality without ever having to think about DataFrames again. So dealing with the ModelMatrix specification means that you've suddenly solved a big part of your API design problem.

Yeah, that makes sense. This probably isn't the best solution, but here is a gist of what I've been doing when I need to convert between the two if folks are interested. https://gist.github.com/Rory-Finnegan/231c2478262833ea024f#file-mapping-jl

Hi all,

I'm interested in contributing to achieving the machine learning roadmap as part of JSoC 2015. I was wondering if anybody is willing to mentor me on this project. The deadline is June 1st. That's too soon and a quick response would be great. If anybody has the time and are willing to do the same then please contact me(rinuboney@gmail.com) asap. I'm not a Julia expert but I know machine learning and I believe I can I can do this.