Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 03 19:26

    haifengl on master

    drop DataTransform and move imp… DataFrameTransform -> DataTrans… (compare)

  • Dec 01 23:08

    haifengl on master

    DataFrameTransform (compare)

  • Dec 01 22:40

    haifengl on master

    add DataTransform (compare)

  • Nov 10 22:28
    haifengl commented #700
  • Nov 10 21:14
    afossa commented #700
  • Nov 10 21:13
    afossa commented #700
  • Nov 10 19:46

    haifengl on master

    SecurityManager is deprecated f… (compare)

  • Nov 10 19:34
    haifengl commented #701
  • Nov 10 19:15

    haifengl on master

    use scala.jdk.CollectionConvert… (compare)

  • Nov 10 19:11

    haifengl on master

    fix coefficients() (compare)

  • Nov 10 13:48
    oritush commented #701
  • Nov 09 16:46
    haifengl commented #700
  • Nov 09 13:28
    haifengl commented #700
  • Nov 09 13:13
    afossa commented #700
  • Nov 09 12:38
    haifengl closed #701
  • Nov 09 12:38
    haifengl commented #701
  • Nov 09 12:37
    haifengl commented #700
  • Nov 09 12:20
    oritush opened #701
  • Nov 09 09:04
    afossa opened #700
  • Nov 06 22:57
    haifengl closed #699
Adrian Le'Roy Devezin
@dri94
Documenation says to serialize a model to disk we can do write.xstream(model, file) however this method is not available. How can I save my trained model to disk?
orion2107
@orion2107
Hello everyone, I'm kind of new to smile RandomForest model implementation and usage.
I have an issue on an already trained model which is when I'm using the model for predictions (calling model.predict API) I sometimes get NullPointerException.
This happens only during load test which I call to this method with 5-10 concurrent users.
Here is part of the log file I get:
java.lang.NullPointerException: null
at smile.data.formula.Formula$2.getDouble(Formula.java:358)
at smile.base.cart.OrdinalNode.predict(OrdinalNode.java:45)
at smile.classification.DecisionTree.predict(DecisionTree.java:361)
at smile.classification.RandomForest.predict(RandomForest.java:514)
Can anyone please share some light on this if you happen to notice this kind of behavior as well?
Thank you
Haifeng Li
@haifengl
@orion2107 this issue is fixed. you can build the master branch and try it with your code. Or you can just load the model separately in each thread as a work around.
orion2107
@orion2107
@haifengl Thank you very much for your answer, I truly appreciate it, can I ask in which smile release the fix was made? I'm currently using "com.github.haifengl" %% "smile-scala" % "2.5.2" with Scala 2.13.1.
I'm asking in order to understand if the release is updated with the fix
Thank you very much
Ahmad Ragab
@ASRagab
@haifengl submitted a PR to get the jupyterlab.sh to bootstrap and install the almond kernel haifengl/smile#672
Carsten Behring
@behrica
Are the Smile example datasets and their classes published somewhere in maven ?
João Costa
@jd557:matrix.org
[m]
Can someone help me double check the default gradient implementation in the DifferentiableMultivariateFunction? I think there might be a bug in here: https://github.com/haifengl/smile/blob/master/math/src/main/java/smile/math/DifferentiableMultivariateFunction.java#L39-L59
shouldn't the xh initialization be inside the loop as a copy of x, like:
default double g(double[] x, double[] gradient) { double fx = f(x); int n = x.length; for (int i = 0; i < n; i++) { double[] xh = x.clone(); double xi = x[i]; double h = EPSILON * Math.abs(xi); if (h == 0.0) { h = EPSILON; } xh[i] = xi + h; // trick to reduce finite-precision error. h = xh[i] - xi; double fh = f(xh); xh[i] = xi; gradient[i] = (fh - fx) / h; } return fx; }
I think that, using the current implementation, this will compute the gradient using f(x1 + h1, 0, 0) - f(x1, x2, x3) , f(x1 + h1, x2 + h2, 0) - f(x1, x2, x3), which seems wrong... or am I missing something?
João Costa
@jd557:matrix.org
[m]
:point_up: Edit: default double g(double[] x, double[] gradient) { double fx = f(x); int n = x.length; for (int i = 0; i < n; i++) { double[] xh = x.clone(); double xi = x[i]; double h = EPSILON * Math.abs(xi); if (h == 0.0) { h = EPSILON; } xh[i] = xi + h; // trick to reduce finite-precision error. h = xh[i] - xi; double fh = f(xh); xh[i] = xi; gradient[i] = (fh - fx) / h; } return fx; }
```
João Costa
@jd557:matrix.org
[m]
yup... just tested it, the current implementation seems wrong :(
João Costa
@jd557:matrix.org
[m]
now I'm getting some LAPACK errors on L-BFGS-B's subspace minimization... any idea on how I can debug this?
java.lang.ArithmeticException: LAPACK GETRS error code: -8
  at smile.math.matrix.Matrix$LU.solve(Matrix.java:2219)
  at smile.math.matrix.Matrix$LU.solve(Matrix.java:2189)
  at smile.math.BFGS.subspaceMinimization(BFGS.java:875)
  at smile.math.BFGS.minimize(BFGS.java:647)
@haifengl: since you decided to move the copy to inside the loop, I think this line is no longer needed: https://github.com/haifengl/smile/commit/f476bdcfd093829ae10e5fbf1534c36939166f72#diff-d7deba85f47ead61063ed177d30799d7710a4a0a4675363bfc0c99c117da6eb0R55
davidzxc574
@davidzxc574
I was using simile-core 2.4.0 to run xmeans clustering. It worked on my own PC with IDEA,scala 2.11. And the jar I created work on my pc as well. But when I uploaded it to a CDH and ran it, it gave me an error message that it couldnt find class smile/clustering/packages$. Any idea?
图片.png
ahsanspark
@ahsanspark

kindly look into this issue regarding "Formula.lhs" of RandomForest. As my dataset goes through several tranformations I end up having this,

var xtrain: Array[Array[Double]] = xtrainx
var ytrain: Array[Int] = bc_ytrainSet.value.map(x=>scala.math.floor(x).toInt)
var xtest: Array[Array[Double]] = xtestx
var ytest: Array[Int] = bc_ytestSet.value.map(x=>scala.math.floor(x).toInt)
//var nn: KNN[Array[Double]] =KNN.fit(xtrain, ytrain, 5)
var rf = RandomForest.fit(Formula.lhs(?), xtrain)
var pred = rf.predict(xtest)
var accu = Accuracy.of(ytest, pred)

actually I want to know, what to write inside Formula.lhs(?), in the absense of any header. For KNN it is working fine without any header.

Haifeng Li
@haifengl
RandomForest takes a DataFrame. You cannot pass xtrain, which is array.
ahsanspark
@ahsanspark
thank you for your reply, I am working on it.
ahsanspark
@ahsanspark
  1. I am trying to implement Naive Bayes classifiers and my training set is of type Array[Array[Double]] which formed after applying tfidf vectorization method. My problem is update method of DiscreteNaiveBayes requires either SparseArray[] or int[][], but my training set is of Array[Array[Double]]. Is there any solution.

2 . How to call a custom function on a specific column of Dataframe, as we do in python pandas.

def fun(num):
some operation

new = df["column"].apply(fun)

was there a contributors agreement which made it legal to re-license contributors code?
Stephan Kölle
@stephankoelle
I see, not needed: LGPLv2.1 gives you permission to relicense the code under any version of the GPL since GPLv2.
davidzxc574
@davidzxc574
Is it possbile to have Array/Vector as data for SMILE regession, gradient boosting for example? I see only dataframe as data input . And on internect where I can find a demo/example of gradient boosting regression? Much appreciated
a scala demo/example