Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 14:08

    haifengl on master

    add KMedoidsImputer and remove … (compare)

  • 01:31

    haifengl on master

    fix missing value detection bug (compare)

  • May 15 01:49

    haifengl on master

    refactor SVDImputer refactor LLSImputer refactor imputation test and 1 more (compare)

  • May 14 03:33
    dongdongunique commented #722
  • May 13 21:34

    haifengl on master

    move names()/type()/measures() … add KNNImputer (compare)

  • May 13 14:57
    haifengl commented #722
  • May 13 14:16
    dongdongunique commented #722
  • May 13 13:22
    haifengl commented #722
  • May 13 12:58
    dongdongunique opened #722
  • May 12 14:36
    haifengl closed #720
  • May 10 02:40
    chyun closed #721
  • May 10 02:32
    chyun opened #721
  • May 06 01:43
    simonshiwt commented #720
  • May 06 01:23
    haifengl closed #719
  • May 06 01:23
    haifengl commented #720
  • May 05 14:28
    simonshiwt edited #720
  • May 05 14:28
    simonshiwt edited #720
  • May 05 09:02
    simonshiwt edited #720
  • May 05 09:02
    simonshiwt edited #720
  • May 05 09:01
    simonshiwt edited #720
Haifeng Li
@haifengl
@Jingwei-THU can you try javacpp 1.5.4 (and accordingly openblas and arpack-ng)?
Tatsuaki KOBAYASHI
@tatsunidas
@haifengl
Hi,
Can I run my code with SMILE on GPU ?
Haifeng Li
@haifengl
no
ynjacobs
@ynjacobs
Hi @haifengl thanks again for this library! I'm wondering how I can go about reading lines from a dataframe, doing some computations and then making that into a new column of a dataframe. In python I could do something like:
new_col = lambda each_row: math.sqrt((3*each_row.first_column)**4 + (3*each_row.second_column)**4+ (3*each_row.third_column)**4) my_dataframe.apply(new_col)
Haifeng Li
@haifengl
Use smile.data.formula.Formula
import smile.data.formula.Formula;
import static smile.data.formula.Terms.*;
in scala, we have a DSL to write formula close to math form.
In Java, you have to assemble the formula by methods provided in Terms
ynjacobs
@ynjacobs
Thank you! And how would I get each row of the dataframe to make the computation on? Is that included in Formula? I'm using Java.
Haifeng Li
@haifengl
df.apply(formula) will return a new dataframe
ynjacobs
@ynjacobs
Thank you! But how would I get each of the rows in each column of the original dataframe to apply the formula on? Like if I want to take all rows of the first column and add 2 and then all rows in second column and add 2 and then add those together. Would I need to use like df.stream().map(row -> { or something?
Haifeng Li
@haifengl
no need. df.apply(formula) will do all the magic.
ynjacobs
@ynjacobs
Thank you!
Dylan Kane
@dmkaner
Hi everyone! Does anyone know of any working Java example repositories with smile? I'm trying to learn how to use it and some examples would be super helpful.
Dylan Kane
@dmkaner
@haifengl ?
Pierre Nodet
@pierrenodet
@dmkaner In the official documentation there is a button to chose between scala, java or kotlin for code examples
Dylan Kane
@dmkaner
@pierrenodet Thanks, but I was looking for something with a little more context maybe? Those examples were pretty limited
Haifeng Li
@haifengl
@dmkaner check out unit tests
Dylan Kane
@dmkaner
@haifengl thanks for the response Haifeng. Where can I find these?
Lukas Braach
@lukasbraach
@haifengl One more question regarding Random Forest: Does smile's decision tree implementation adapt to the input fields measure, e. g. Categorical vs. Numerical scale?
implisci
@implisci
Hello @haifengl Are you considering something like this https://github.com/gpu/JOCLSamples/tree/master/src/main/java/org/jocl/samples for algorithms that can benefit from GPU? Or is there something else (jcuda?)
Haifeng Li
@haifengl
@dmkaner it is in the same code base
@lukasbraach yes
@implisci we will leverage gpu
Dylan Kane
@dmkaner
Anyone know why smile.io can't be found when using smile 2.5.3 with Maven in Java?
Also thanks for the response @haifengl
Dylan Kane
@dmkaner
^ above question not pertinent, just add smile.io code from GitHub repository into some local classes if anyone else has the same issue.
Haifeng Li
@haifengl
@dmkaner smile.io is in its own package (smile-io)
Nino
@weinino
Hi @haifengl
I have a situation:
1.) I've trained a random forest with a DataFrame object (target, feature 1, ... , feature 5).
2.) I would like to use RandomForest::predict with a new sample (production), that I get as double[5] {1,2,3,4,5}.
3.) I generate a Tuple t with schema (feature 1, ..., feature 5) and data [1,2,3,4,5]
4.) When I run predict(t) an array out of bounds exception occures. The problem is, that it tries to access feature 5 with index 5. This was correct in the schema of the dataset but not for the schema of the tuple t.
I would have expected, that in predict, it would access the data for Feature 5 from the tuple t on index 4.
Is there something I'm doing fundamentally wrong or might there be some inconsistencies? I couldn't help myself, since all cases I found in some sort of documentation predict on a tuple coming from the original DataFrame.
Thank you :-)
Nino
@weinino
Two solutions seem possible in my opinion:
1.) Bind the new schema (feature 1, ... , feature 5).I've tried this, but since "response != null" in the RF's formula I get a NPE. So, is there any support for execution samples without targets?
2.) I artificially change my sample to include some dummy labels, s.t. the targets have a value aswell and the schema would correspond again to the training case.
Haifeng Li
@haifengl
#2 should work. option #1 should work too with v2.5.3. Which version are you using?
Nino
@weinino
Ok, thank you!
I'll try #1 first, after updating to v2.5.3. At the moment I'm on v2.3.0
Haifeng Li
@haifengl
On v2.5.3, you don't need to bind the schema manually. Smile handles it automatically.
Nino
@weinino
@haifengl I've updated the version and now all looks fine. Thank you :-)
Lukas Braach
@lukasbraach
Hey @haifengl, my Random Forest model is working (mostly) as expected, thank you! I still have one question: On the first inference with the freshly trained model, smile logs The response variable Classification doesn't exist in the schema [...]. Should I pay attention to this message? How do I get rid of it?
Martin Zazvorka
@MZazvor_gitlab
Hi, is there a possibility to compute OLS without additional statistics ? We are using this for a application where statistics have to be computed anyway separately so it would mean major performance improvement. I have read somewhere In comparison with apache, that OLS stats take significant part of computational time. Thanx for any hint. Martin
Haifeng Li
@haifengl
@lukasbraach It is a DEBUG level log message. You see it only if your log level is debug or lower.
@MZazvor_gitlab sterr = true
rikima
@rikima_twitter
I noticed that SMILE is awesome ML library implemented via java/scala yesterday. so, I am planning to use this for our system which has ML based functionality.
I am studying this framework now, are there functions to get ROC /AUC metrics?
Can all classification algorithm output score as well as predicted label? 
Haifeng Li
@haifengl
@rikima_twitter you can find answers at http://haifengl.github.io/. There are API doc too.
For AUC (and many other metrics), checkout smile.validation.metric package. The classification algorithms report posteriori probabilities if they are SoftClassifier. Some algorithms also have a score() method, which is not necessarily probabilities though.
rikima
@rikima_twitter
@haifengl Thank you so much for your support. I will check the doc and the info. you suggested.
again, SMILE is awesome ML library! clean dataframe implementation, classification framework which enable to expand other algorithm easliy. so, I am thinking to implement missing algorithms like a factorization machine, field aware FM, ngboost, and so on
Haifeng Li
@haifengl
@rikima_twitter check out CrossValidation, Bootstrap, etc. in smile.validation. They calculate all the metrics automatically.
@rikima_twitter Look forward to your contributions of new algorithms. Thanks a lot in advance!
rikima
@rikima_twitter
Thanks!
I am looking for sparsearray.java now. I need sparse vector implementation for example based classifier like SVM or SparceLogisticRegression. I could not find out smile.util package in repository.
rikima
@rikima_twitter
in smile-math, there is smile.util package, in there, there is SparseArray.java
Ryan Bennett
@rwbennett

Hi, I have a question about the new(?) OLS.fit() method. If I try to use it to predict housing sale prices, like OLS.fit(Formula.lhs("SalePrice"), X_train_dataframe), it fails with "no response variable".

So it seems I can only use OLS if I pass it a dataframe that includes X and y together. However, if I use it like "OLS.fit(Formula.lhs("SalePrice"), training_dataframe), which includes both the dependent and independent columns (X and y), then predict requires an array of the same size, including a column for the value I wish to predict.

However, doing that and passing, say, 0 for the y value results in wildly incorrect predictions, and changing the value affects predictions. Is there not a way to use OLS without comingling X and y?