Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 12 14:00
    gciatto-unibo opened #656
  • Apr 12 13:34
    haifengl commented #655
  • Apr 12 13:34

    haifengl on master

    Fixed bug in "ones" Merge pull request #655 from da… (compare)

  • Apr 12 13:34
    haifengl closed #655
  • Apr 12 09:45
    darrenjw opened #655
  • Apr 04 13:57

    haifengl on master

    11ty 0.12.1 (compare)

  • Apr 03 03:25
    afan0918 closed #654
  • Apr 02 15:41
    afan0918 opened #654
  • Apr 01 17:02

    dependabot[bot] on npm_and_yarn

    (compare)

  • Apr 01 17:02
    dependabot[bot] commented #653
  • Apr 01 17:02
    haifengl closed #653
  • Mar 31 06:54

    dependabot[bot] on npm_and_yarn

    Bump y18n from 4.0.0 to 4.0.1 i… (compare)

  • Mar 31 06:54
    dependabot[bot] labeled #653
  • Mar 31 06:54
    dependabot[bot] opened #653
  • Mar 29 01:05

    haifengl on master

    smile commands (compare)

  • Mar 24 23:35

    haifengl on master

    regression prediction (compare)

  • Mar 23 16:50
    haifengl closed #652
  • Mar 23 16:50
    haifengl commented #652
  • Mar 23 15:28
    abubakr-awad edited #652
  • Mar 23 15:19
    abubakr-awad opened #652
Haifeng Li
@haifengl
@lukasbraach yes
@implisci we will leverage gpu
Dylan Kane
@dmkaner
Anyone know why smile.io can't be found when using smile 2.5.3 with Maven in Java?
Also thanks for the response @haifengl
Dylan Kane
@dmkaner
^ above question not pertinent, just add smile.io code from GitHub repository into some local classes if anyone else has the same issue.
Haifeng Li
@haifengl
@dmkaner smile.io is in its own package (smile-io)
Nino
@weinino
Hi @haifengl
I have a situation:
1.) I've trained a random forest with a DataFrame object (target, feature 1, ... , feature 5).
2.) I would like to use RandomForest::predict with a new sample (production), that I get as double[5] {1,2,3,4,5}.
3.) I generate a Tuple t with schema (feature 1, ..., feature 5) and data [1,2,3,4,5]
4.) When I run predict(t) an array out of bounds exception occures. The problem is, that it tries to access feature 5 with index 5. This was correct in the schema of the dataset but not for the schema of the tuple t.
I would have expected, that in predict, it would access the data for Feature 5 from the tuple t on index 4.
Is there something I'm doing fundamentally wrong or might there be some inconsistencies? I couldn't help myself, since all cases I found in some sort of documentation predict on a tuple coming from the original DataFrame.
Thank you :-)
Nino
@weinino
Two solutions seem possible in my opinion:
1.) Bind the new schema (feature 1, ... , feature 5).I've tried this, but since "response != null" in the RF's formula I get a NPE. So, is there any support for execution samples without targets?
2.) I artificially change my sample to include some dummy labels, s.t. the targets have a value aswell and the schema would correspond again to the training case.
Haifeng Li
@haifengl
#2 should work. option #1 should work too with v2.5.3. Which version are you using?
Nino
@weinino
Ok, thank you!
I'll try #1 first, after updating to v2.5.3. At the moment I'm on v2.3.0
Haifeng Li
@haifengl
On v2.5.3, you don't need to bind the schema manually. Smile handles it automatically.
Nino
@weinino
@haifengl I've updated the version and now all looks fine. Thank you :-)
Lukas Braach
@lukasbraach
Hey @haifengl, my Random Forest model is working (mostly) as expected, thank you! I still have one question: On the first inference with the freshly trained model, smile logs The response variable Classification doesn't exist in the schema [...]. Should I pay attention to this message? How do I get rid of it?
Martin Zazvorka
@MZazvor_gitlab
Hi, is there a possibility to compute OLS without additional statistics ? We are using this for a application where statistics have to be computed anyway separately so it would mean major performance improvement. I have read somewhere In comparison with apache, that OLS stats take significant part of computational time. Thanx for any hint. Martin
Haifeng Li
@haifengl
@lukasbraach It is a DEBUG level log message. You see it only if your log level is debug or lower.
@MZazvor_gitlab sterr = true
rikima
@rikima_twitter
I noticed that SMILE is awesome ML library implemented via java/scala yesterday. so, I am planning to use this for our system which has ML based functionality.
I am studying this framework now, are there functions to get ROC /AUC metrics?
Can all classification algorithm output score as well as predicted label? 
Haifeng Li
@haifengl
@rikima_twitter you can find answers at http://haifengl.github.io/. There are API doc too.
For AUC (and many other metrics), checkout smile.validation.metric package. The classification algorithms report posteriori probabilities if they are SoftClassifier. Some algorithms also have a score() method, which is not necessarily probabilities though.
rikima
@rikima_twitter
@haifengl Thank you so much for your support. I will check the doc and the info. you suggested.
again, SMILE is awesome ML library! clean dataframe implementation, classification framework which enable to expand other algorithm easliy. so, I am thinking to implement missing algorithms like a factorization machine, field aware FM, ngboost, and so on
Haifeng Li
@haifengl
@rikima_twitter check out CrossValidation, Bootstrap, etc. in smile.validation. They calculate all the metrics automatically.
@rikima_twitter Look forward to your contributions of new algorithms. Thanks a lot in advance!
rikima
@rikima_twitter
Thanks!
I am looking for sparsearray.java now. I need sparse vector implementation for example based classifier like SVM or SparceLogisticRegression. I could not find out smile.util package in repository.
rikima
@rikima_twitter
in smile-math, there is smile.util package, in there, there is SparseArray.java
Ryan Bennett
@rwbennett

Hi, I have a question about the new(?) OLS.fit() method. If I try to use it to predict housing sale prices, like OLS.fit(Formula.lhs("SalePrice"), X_train_dataframe), it fails with "no response variable".

So it seems I can only use OLS if I pass it a dataframe that includes X and y together. However, if I use it like "OLS.fit(Formula.lhs("SalePrice"), training_dataframe), which includes both the dependent and independent columns (X and y), then predict requires an array of the same size, including a column for the value I wish to predict.

However, doing that and passing, say, 0 for the y value results in wildly incorrect predictions, and changing the value affects predictions. Is there not a way to use OLS without comingling X and y?

Anoukh Ashley
@anoukh_ashley_twitter
Hi. Is there a version of smile-mkl that can be used with scala 2.11 ? I can't seem to get sbt to download the dependency libraryDependencies += "com.github.haifengl" %% "smile-mkl" % "2.6.0"
Haifeng Li
@haifengl
@rwbennett the training data frame must have both X and y. but for prediction, it doesn't require y in the data frame. Make sure to use the latest version.
@anoukh_ashley_twitter smile-mkl is a pure java library. do libraryDependencies += "com.github.haifengl" % "smile-mkl" % "2.6.0"
Ryan Bennett
@rwbennett
@haifengl I'm using 2.6.0. My code (https://pastebin.com/FEUsXqpY) uses a dataframe with both X and y (21 columns), but when I try to predict using a 20-column double array, I get "java.lang.IllegalArgumentException: Invalid input vector size: 20, expected: 21". Shouldn't prediction work with a 20-column dataframe, given that I trained the model on a 21-column DF with Formula.lhs("SalePrices") ?
Haifeng Li
@haifengl
@rwbennett first of all, life is much easier to read data frame by smile's api. Why do go through tablesaw? for prediction, it is native to use Tuple/DataFrame. If you have to use double[], make sure to include the bias (aka 1) in your vector.
Ryan Bennett
@rwbennett
@haifengl , I'm using tablesaw because I'm following a somewhat poorly-explained Udemy course that uses tablesaw with smile. Thanks for the suggestion though; I'll look into using smile's dataframes directly instead. As for using the double array with a bias of 1, I assume you mean inserting a 1 at index 0 in the array? I'm not sure if the reason is that the array is assumed to be 1-indexed by smile, but at any rate, doing so gives a result that matches the example I'm following, so thank you very much for your help!
Haifeng Li
@haifengl
@rwbennett Smile is not 1-indexed. A linear model may or may not has a bias item. If you provide a tuple (e.g. dataframe.get(0)), smile will create the proper vector automatically. It is especially important when the data frame has categorical variables.
wholfy
@wholfy
@haifengl Hi, I would appreciate your advice.
I have a large data array, that needs to be partitioned into clusters, but I can't load it to RAM.
Is it possible to load the source array in a parts and uses GMeans so that the result is similar to the clustered source array?
Haifeng Li
@haifengl
@wholfy no, GMeans needs all the data.
hmf
@hmf
I am trying to cluster a large number of instances (60000). Trying hierarchical clustering fails because n(n-1)/2 exceeds the array's float length. Which algorithm would be the best to use? Do these avoid the O(N^2) space distance matrix: clarans, dbscan, minimum entropy clustering ?
Haifeng Li
@haifengl
@hmf these methods all works on large data
hmf
@hmf
@haifengl Thanks. Will try them on a 10k data-set.
lskowr
@lskowr
@haifengl Is Lanczos in Smile thread-safe (also on the native level)?
Murat Koptur
@mrtkp9993
Hello, I am newbie to Smile, I need to perform PCA and clustering but examples on http://haifengl.github.io/clustering.html didn't work for me.
I wrote following code but I'm getting LAPACK GESDD error:

        CLARANS<double[]> clusters = PartitionClustering.run(20, () -> CLARANS.fit(x, new EuclideanDistance(), 6, 10));

        PCA pca = PCA.fit(x);
        pca.setProjection(2);
        double[][] y = pca.project(x);

        Canvas plot = ScatterPlot.of(y, clusters.y, '-').canvas();
x is a double[][]
I need to graph PCA, and label each point with string
Murat Koptur
@mrtkp9993
var clusters = GMeans.fit(x, 5);
I am getting Index 0 out of bounds for length 0 for Gmeans.
Darren Wilkinson
@darrenjw

Hi all. I'm having trouble getting started with basic linear algebra operations in Smile. I wonder if someone could help? In particular, I don't think I understand how symmetric matrices work.

val m2 = matrix(c(3.0,1.0),c(1.0,2.0)) // create a matrix, which is symmetric
m2.isSymmetric // returns false
m2.cholesky() // fails

If I create a symmetric matrix, isSymmetric nevertheless returns false, so naturally, cholesky fails. Is there something I need to do to tell Smile that the matrix is symmetric? Thanks,

Haifeng Li
@haifengl
@darrenjw smile doesn't check if the matrix is symmetric by comparing element values. It is too slow and also depends on the epsilon. If you know that your matrix is symmetric, you should use SymmMatrix class.
Darren Wilkinson
@darrenjw
OK, thanks. My next question relates to QR decomposition. I may be doing something wrong, but the results are not what I would expect. For example, given something like:
val mat = matrix(c(3.0,3.5),c(2.0,2.0),c(0.0,1.0))
mat.qr().Q
returns a matrix with columns that are not orthonormal.