Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 01 23:08

    haifengl on master

    DataFrameTransform (compare)

  • Dec 01 22:40

    haifengl on master

    add DataTransform (compare)

  • Nov 10 22:28
    haifengl commented #700
  • Nov 10 21:14
    afossa commented #700
  • Nov 10 21:13
    afossa commented #700
  • Nov 10 19:46

    haifengl on master

    SecurityManager is deprecated f… (compare)

  • Nov 10 19:34
    haifengl commented #701
  • Nov 10 19:15

    haifengl on master

    use scala.jdk.CollectionConvert… (compare)

  • Nov 10 19:11

    haifengl on master

    fix coefficients() (compare)

  • Nov 10 13:48
    oritush commented #701
  • Nov 09 16:46
    haifengl commented #700
  • Nov 09 13:28
    haifengl commented #700
  • Nov 09 13:13
    afossa commented #700
  • Nov 09 12:38
    haifengl closed #701
  • Nov 09 12:38
    haifengl commented #701
  • Nov 09 12:37
    haifengl commented #700
  • Nov 09 12:20
    oritush opened #701
  • Nov 09 09:04
    afossa opened #700
  • Nov 06 22:57
    haifengl closed #699
  • Nov 02 02:45
    tkorach commented on 146289d
Murat Koptur
@mrtkp9993
I wrote following code but I'm getting LAPACK GESDD error:

        CLARANS<double[]> clusters = PartitionClustering.run(20, () -> CLARANS.fit(x, new EuclideanDistance(), 6, 10));

        PCA pca = PCA.fit(x);
        pca.setProjection(2);
        double[][] y = pca.project(x);

        Canvas plot = ScatterPlot.of(y, clusters.y, '-').canvas();
x is a double[][]
I need to graph PCA, and label each point with string
Murat Koptur
@mrtkp9993
var clusters = GMeans.fit(x, 5);
I am getting Index 0 out of bounds for length 0 for Gmeans.
Darren Wilkinson
@darrenjw

Hi all. I'm having trouble getting started with basic linear algebra operations in Smile. I wonder if someone could help? In particular, I don't think I understand how symmetric matrices work.

val m2 = matrix(c(3.0,1.0),c(1.0,2.0)) // create a matrix, which is symmetric
m2.isSymmetric // returns false
m2.cholesky() // fails

If I create a symmetric matrix, isSymmetric nevertheless returns false, so naturally, cholesky fails. Is there something I need to do to tell Smile that the matrix is symmetric? Thanks,

Haifeng Li
@haifengl
@darrenjw smile doesn't check if the matrix is symmetric by comparing element values. It is too slow and also depends on the epsilon. If you know that your matrix is symmetric, you should use SymmMatrix class.
Darren Wilkinson
@darrenjw
OK, thanks. My next question relates to QR decomposition. I may be doing something wrong, but the results are not what I would expect. For example, given something like:
val mat = matrix(c(3.0,3.5),c(2.0,2.0),c(0.0,1.0))
mat.qr().Q
returns a matrix with columns that are not orthonormal.
András Dippold
@adippold
Hi all. While upgrading my project to use the latest version of Smile, I noticed that the SVM (one-vs-one) does not return the probabilities associated with the predicted labels anymore. I use the probability value to accept decisions from the trained SVM only if it is reasonably sure in its decision. Looking at the sources, I found that within KernelMachine, a 'score' is computed and is used for assigning the appropriate labels. My question is: how can I compute the probability value from score?
András Dippold
@adippold
Found the answer - PlattScaling got moved out from the SVM class and is now available as a separate class.
Pierre Nodet
@pierrenodet
Hey @haifengl, just for you to be aware, Apache Spark are moving away from fommil net-lib to a newer implementation. You can follow the process which is in this pull request : apache/spark#32415. It could be interesting for smile to follow the same path for better integration. Breeze has done the same too.
I can open an issue btw if you want to.
Haifeng Li
@haifengl
@pierrenodet we moved away awhile back. netlib (and nd4j) modules are not used and will be removed.
Pierre Nodet
@pierrenodet
ok nice !
Adrian Le'Roy Devezin
@dri94
How do I create a sparsevector for a column in the dataframe? I have a dataframe of tweets. So I need all the words in in the "headline" column of my dataframe to be converted into a sparse vector instead (ex: 0, 1, 0... 1, 0, 0). I haven't been able to find clear documenation to do this. I see the vectorize method but am still not quite sure how to peace it all together.
Haifeng Li
@haifengl
@dri94 DataFrame is for dense tabular data. It is not designed for sparse data.
Adrian Le'Roy Devezin
@dri94
@haifengl do you have a recommendation on what I should do then?
Adrian Le'Roy Devezin
@dri94
nvm. I see there is a sparse dataset
Adrian Le'Roy Devezin
@dri94
Documenation says to serialize a model to disk we can do write.xstream(model, file) however this method is not available. How can I save my trained model to disk?
orion2107
@orion2107
Hello everyone, I'm kind of new to smile RandomForest model implementation and usage.
I have an issue on an already trained model which is when I'm using the model for predictions (calling model.predict API) I sometimes get NullPointerException.
This happens only during load test which I call to this method with 5-10 concurrent users.
Here is part of the log file I get:
java.lang.NullPointerException: null
at smile.data.formula.Formula$2.getDouble(Formula.java:358)
at smile.base.cart.OrdinalNode.predict(OrdinalNode.java:45)
at smile.classification.DecisionTree.predict(DecisionTree.java:361)
at smile.classification.RandomForest.predict(RandomForest.java:514)
Can anyone please share some light on this if you happen to notice this kind of behavior as well?
Thank you
Haifeng Li
@haifengl
@orion2107 this issue is fixed. you can build the master branch and try it with your code. Or you can just load the model separately in each thread as a work around.
orion2107
@orion2107
@haifengl Thank you very much for your answer, I truly appreciate it, can I ask in which smile release the fix was made? I'm currently using "com.github.haifengl" %% "smile-scala" % "2.5.2" with Scala 2.13.1.
I'm asking in order to understand if the release is updated with the fix
Thank you very much
Ahmad Ragab
@ASRagab
@haifengl submitted a PR to get the jupyterlab.sh to bootstrap and install the almond kernel haifengl/smile#672
Carsten Behring
@behrica
Are the Smile example datasets and their classes published somewhere in maven ?
João Costa
@jd557:matrix.org
[m]
Can someone help me double check the default gradient implementation in the DifferentiableMultivariateFunction? I think there might be a bug in here: https://github.com/haifengl/smile/blob/master/math/src/main/java/smile/math/DifferentiableMultivariateFunction.java#L39-L59
shouldn't the xh initialization be inside the loop as a copy of x, like:
default double g(double[] x, double[] gradient) { double fx = f(x); int n = x.length; for (int i = 0; i < n; i++) { double[] xh = x.clone(); double xi = x[i]; double h = EPSILON * Math.abs(xi); if (h == 0.0) { h = EPSILON; } xh[i] = xi + h; // trick to reduce finite-precision error. h = xh[i] - xi; double fh = f(xh); xh[i] = xi; gradient[i] = (fh - fx) / h; } return fx; }
I think that, using the current implementation, this will compute the gradient using f(x1 + h1, 0, 0) - f(x1, x2, x3) , f(x1 + h1, x2 + h2, 0) - f(x1, x2, x3), which seems wrong... or am I missing something?
João Costa
@jd557:matrix.org
[m]
:point_up: Edit: default double g(double[] x, double[] gradient) { double fx = f(x); int n = x.length; for (int i = 0; i < n; i++) { double[] xh = x.clone(); double xi = x[i]; double h = EPSILON * Math.abs(xi); if (h == 0.0) { h = EPSILON; } xh[i] = xi + h; // trick to reduce finite-precision error. h = xh[i] - xi; double fh = f(xh); xh[i] = xi; gradient[i] = (fh - fx) / h; } return fx; }
```
João Costa
@jd557:matrix.org
[m]
yup... just tested it, the current implementation seems wrong :(
João Costa
@jd557:matrix.org
[m]
now I'm getting some LAPACK errors on L-BFGS-B's subspace minimization... any idea on how I can debug this?
java.lang.ArithmeticException: LAPACK GETRS error code: -8
  at smile.math.matrix.Matrix$LU.solve(Matrix.java:2219)
  at smile.math.matrix.Matrix$LU.solve(Matrix.java:2189)
  at smile.math.BFGS.subspaceMinimization(BFGS.java:875)
  at smile.math.BFGS.minimize(BFGS.java:647)
@haifengl: since you decided to move the copy to inside the loop, I think this line is no longer needed: https://github.com/haifengl/smile/commit/f476bdcfd093829ae10e5fbf1534c36939166f72#diff-d7deba85f47ead61063ed177d30799d7710a4a0a4675363bfc0c99c117da6eb0R55
davidzxc574
@davidzxc574
I was using simile-core 2.4.0 to run xmeans clustering. It worked on my own PC with IDEA,scala 2.11. And the jar I created work on my pc as well. But when I uploaded it to a CDH and ran it, it gave me an error message that it couldnt find class smile/clustering/packages$. Any idea?
图片.png
ahsanspark
@ahsanspark

kindly look into this issue regarding "Formula.lhs" of RandomForest. As my dataset goes through several tranformations I end up having this,

var xtrain: Array[Array[Double]] = xtrainx
var ytrain: Array[Int] = bc_ytrainSet.value.map(x=>scala.math.floor(x).toInt)
var xtest: Array[Array[Double]] = xtestx
var ytest: Array[Int] = bc_ytestSet.value.map(x=>scala.math.floor(x).toInt)
//var nn: KNN[Array[Double]] =KNN.fit(xtrain, ytrain, 5)
var rf = RandomForest.fit(Formula.lhs(?), xtrain)
var pred = rf.predict(xtest)
var accu = Accuracy.of(ytest, pred)

actually I want to know, what to write inside Formula.lhs(?), in the absense of any header. For KNN it is working fine without any header.

Haifeng Li
@haifengl
RandomForest takes a DataFrame. You cannot pass xtrain, which is array.
ahsanspark
@ahsanspark
thank you for your reply, I am working on it.
ahsanspark
@ahsanspark
  1. I am trying to implement Naive Bayes classifiers and my training set is of type Array[Array[Double]] which formed after applying tfidf vectorization method. My problem is update method of DiscreteNaiveBayes requires either SparseArray[] or int[][], but my training set is of Array[Array[Double]]. Is there any solution.

2 . How to call a custom function on a specific column of Dataframe, as we do in python pandas.

def fun(num):
some operation

new = df["column"].apply(fun)

was there a contributors agreement which made it legal to re-license contributors code?
Stephan Kölle
@stephankoelle
I see, not needed: LGPLv2.1 gives you permission to relicense the code under any version of the GPL since GPLv2.
davidzxc574
@davidzxc574
Is it possbile to have Array/Vector as data for SMILE regession, gradient boosting for example? I see only dataframe as data input . And on internect where I can find a demo/example of gradient boosting regression? Much appreciated
a scala demo/example