Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 23 17:03
    haifengl commented #623
  • Nov 23 16:50
    gsaxena888 commented #623
  • Nov 23 15:36
    haifengl commented #623
  • Nov 23 13:45
    gsaxena888 opened #623
  • Nov 22 21:03
    haifengl commented #616
  • Nov 22 21:01
    haifengl commented #616
  • Nov 22 20:42

    haifengl on master

    remove sbt stage (compare)

  • Nov 22 20:41
    haifengl commented #616
  • Nov 22 10:34
    PavelZeger commented #616
  • Nov 22 08:53
    PavelZeger commented #616
  • Nov 22 08:50
    PavelZeger commented #616
  • Nov 20 15:23

    haifengl on master

    scala 2.13.4 (compare)

  • Nov 20 14:46
    rayeaster closed #621
  • Nov 20 14:46
    rayeaster commented #621
  • Nov 20 13:42
    pierrenodet closed #622
  • Nov 20 13:42
    pierrenodet commented #622
  • Nov 20 13:36
    haifengl commented #622
  • Nov 20 13:35
    haifengl commented #621
  • Nov 20 13:33

    haifengl on master

    update validation metric import add .sbtopts to handle sbt OOM (compare)

  • Nov 20 13:31
    pierrenodet opened #622
Haifeng Li
@haifengl
@implisci your understanding is correct. matrix.Matrix is for numeric computation. cas.Matrix is for symbolic computation.
implisci
@implisci
Thanks @haifengl. In matrix.Matrix is there a way to supress very small values (1E-15 or less) to display as 0.0? These values for off diagonal elements happens due to floating point precision in matrix multiplication, for example, when a matrix is multiplied with its inverse. Also is it possible to use numerical matrices as input to cas.Matrix ?
implisci
@implisci
The package.scala in cas says "substitution of symbols or numeric values for certain expressions". Could you clarify? How does this substitution work? Regarding my earlier question regarding small non-zero values (that are actually zero but non-zero due to precision limitations) I wonder if those can be recognized as (for example) an identity matrix using some library function without writing a transform for that.
Haifeng Li
@haifengl
@implisci what do you try to do?
Tatsuaki KOBAYASHI
@tatsunidas
Hi, everyone,
I would like to use GradientTreeBoost for feature selection purpose.
Could show me how to with simple sample code ?
Tatsuaki KOBAYASHI
@tatsunidas

SVM

Hi,
I want to use SVM(linear) with OneVsOne strategy for binary classification.
I tried it with reading reference page(https://haifengl.github.io/classification.html), but I got errors from fit() methods.
T[] is a generic type object array, but I prepared DataFrame, what shold I do ?
public void trainSVM(DataFrame data, DataFrame label, String[] headerArrayThatSortedByImportance, int numOfSelection) {
        String[] selectedFeatures = new String[numOfSelection];
        for(int i=0;i<numOfSelection;i++) {
            selectedFeatures[i] = headerArrayThatSortedByImportance[i];
        }
        double[][] x = data.select(selectedFeatures).toArray();
        double[] y = label.toArray()[0];
        double min = Arrays.stream(y).min().getAsDouble(); 
        if(min==0) {
            for(int i=0;i<y.length;i++) {
                if(y[i]==0) {
                    y[i] = -1;
                }
            }
        }
        LinearKernel kernel = new LinearKernel();//GaussianKernel(8.0);
        // I can not run here...
        SVM model = OneVersusOne.fit(x, y, (x, y) -> SVM.fit( x, y,kernel, 1, 1E-3));
    }
2 replies
Tatsuaki KOBAYASHI
@tatsunidas

@haifengl,
and user/developers

Hello,
SMILE is great product.

If you do not mind,please tell me how to do Group KFolds CrossValidation.
I found "GroupKFold" in API:http://haifengl.github.io/api/java/smile/validation/GroupKFold.html.
Then, I tried it, but I do not know following procedures,
1.I want to use SVM, what should I do/set parameters ?
int[] pred = new GropuKFold(1000,10,groups).classification(df, (??, ??) -> SVM.fit(????));
2.If I want visualize ROC curve, and calculate auc of each k, how to do that? When I use scikit-learn's GroupedkFold, I can load each folds dataset and can visualize all folds ROC with auc. By SMILE, how to perform?

Best regards,

13 replies
Tatsuaki KOBAYASHI
@tatsunidas

Sensitivity and Specificity

@haifengl

Hi, I have one question,

  1. Why SMILE Sensitivity.of() is return PPV ?
  2. Why SMILE Specificity.of() is return NPV ?

Or, my code is wrong ? (If so, so sory...)

Validation code is following,

        int[] truth = new int[] {0,0,0,0,0,1,1,1,1,1};
        int[] pred = new int[] {1,0,0,1,0,1,1,1,1,0};
        ConfusionMatrix mtx = ConfusionMatrix.of(truth, pred);
        System.out.println(mtx.toString());
        int[][] mat = mtx.matrix;
        double tn = mat[0][0];//tn
        double fn = mat[0][1];//fn
        double fp = mat[1][0];//fp
        double tp = mat[1][1];//tp
        double ppv = tp/(tp+fp);
        double tpr = tp/(tp+fn);
        double npv = tn/(tn+fn);
        double spc = tn/(fp+tn);
        System.out.println("TPR is "+tpr);//0.66
        System.out.println("PPV is "+ppv);//0.8
        System.out.println("SPC is "+spc);//0.75
        System.out.println("NPV is "+npv);//0.6
        System.out.println("Sensitivity is "+Sensitivity.of(truth, pred));//0.8
        System.out.println("Specificity is "+Specificity.of(truth, pred));//0.6
1 reply
Lukas Braach
@lukasbraach
Hey @haifengl , thank you for open-sourcing smile. I have a question regarding Random Forest classification training and inference: How am I supposed to represent missing values inside the input vector, using the DataFrame and Tuple classes?
Haifeng Li
@haifengl
@lukasbraach missing values should be null. Note that the column type should be to DoubleObjectType (instead of DoubleType). And we don't handle missing values in random forest. So you should impute the missing values first.
2 replies
Tatsuaki KOBAYASHI
@tatsunidas
Hello,
I'd like to compare SMILE GradientTreeBoost and scikit-learn GradientBoostingClassifier.
Do you think how set parameters to GradientTreeBoost ?
I think, I can not get same results, but I do not understand why that be not same. maybe, maxNode settings ...?
Haifeng Li
@haifengl
There are many implementation difference. You won't get exact results among any GBM packages.
1 reply
PedroSena
@PedroSena

HI Everyone, sorry for the newbie question but I'm kinda stuck here and didn't find a good way to work around the problem.
Basically I'm just trying to create a DataFrame from a SQL query but I keep getting an error:

       val url = "jdbc:postgresql://localhost/test"
        val props = Properties()
        props.setProperty("user", "test")
        props.setProperty("password", "test")
        val conn: Connection = DriverManager.getConnection(url, props)
        val rs = conn.createStatement().executeQuery("select 1 from homes")
        val df = DataFrame.of(rs)
        println(df)

and the error:

Exception in thread "main" java.lang.IllegalArgumentException: No enum constant java.sql.JDBCType.int4
    at java.base/java.lang.Enum.valueOf(Enum.java:240)
    at java.sql/java.sql.JDBCType.valueOf(JDBCType.java:34)
    at smile.data.type.DataTypes.struct(DataTypes.java:168)
    at smile.data.type.DataTypes.struct(DataTypes.java:158)
    at smile.data.DataFrame.of(DataFrame.java:1261)

I'm using org.postgresql:postgresql:42.2.12

10 replies
Jingwei-THU
@Jingwei-THU

Hi hifeng, I tried to use MultivariateGaussianMixture but have the following exception:
Warning: Could not load Loader: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path
Exception in thread "main" java.lang.UnsatisfiedLinkError: no jniopenblas_nolapack in java.library.path

I import the dependency in maven like this:

<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>2.5.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.bytedeco/javacpp -->
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.bytedeco/openblas -->
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>openblas</artifactId>
<version>0.3.9-1.5.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.bytedeco/arpack-ng -->
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>arpack-ng</artifactId>
<version>3.7.0-1.5.3</version>
</dependency>

Haifeng Li
@haifengl
@Jingwei-THU can you try javacpp 1.5.4 (and accordingly openblas and arpack-ng)?
Tatsuaki KOBAYASHI
@tatsunidas
@haifengl
Hi,
Can I run my code with SMILE on GPU ?
Haifeng Li
@haifengl
no
ynjacobs
@ynjacobs
Hi @haifengl thanks again for this library! I'm wondering how I can go about reading lines from a dataframe, doing some computations and then making that into a new column of a dataframe. In python I could do something like:
new_col = lambda each_row: math.sqrt((3*each_row.first_column)**4 + (3*each_row.second_column)**4+ (3*each_row.third_column)**4) my_dataframe.apply(new_col)
Haifeng Li
@haifengl
Use smile.data.formula.Formula
import smile.data.formula.Formula;
import static smile.data.formula.Terms.*;
in scala, we have a DSL to write formula close to math form.
In Java, you have to assemble the formula by methods provided in Terms
ynjacobs
@ynjacobs
Thank you! And how would I get each row of the dataframe to make the computation on? Is that included in Formula? I'm using Java.
Haifeng Li
@haifengl
df.apply(formula) will return a new dataframe
ynjacobs
@ynjacobs
Thank you! But how would I get each of the rows in each column of the original dataframe to apply the formula on? Like if I want to take all rows of the first column and add 2 and then all rows in second column and add 2 and then add those together. Would I need to use like df.stream().map(row -> { or something?
Haifeng Li
@haifengl
no need. df.apply(formula) will do all the magic.
ynjacobs
@ynjacobs
Thank you!
Dylan Kane
@dmkaner
Hi everyone! Does anyone know of any working Java example repositories with smile? I'm trying to learn how to use it and some examples would be super helpful.
Dylan Kane
@dmkaner
@haifengl ?
Pierre Nodet
@pierrenodet
@dmkaner In the official documentation there is a button to chose between scala, java or kotlin for code examples
Dylan Kane
@dmkaner
@pierrenodet Thanks, but I was looking for something with a little more context maybe? Those examples were pretty limited
Haifeng Li
@haifengl
@dmkaner check out unit tests
Dylan Kane
@dmkaner
@haifengl thanks for the response Haifeng. Where can I find these?
Lukas Braach
@lukasbraach
@haifengl One more question regarding Random Forest: Does smile's decision tree implementation adapt to the input fields measure, e. g. Categorical vs. Numerical scale?
implisci
@implisci
Hello @haifengl Are you considering something like this https://github.com/gpu/JOCLSamples/tree/master/src/main/java/org/jocl/samples for algorithms that can benefit from GPU? Or is there something else (jcuda?)
Haifeng Li
@haifengl
@dmkaner it is in the same code base
@lukasbraach yes
@implisci we will leverage gpu
Dylan Kane
@dmkaner
Anyone know why smile.io can't be found when using smile 2.5.3 with Maven in Java?
Also thanks for the response @haifengl
Dylan Kane
@dmkaner
^ above question not pertinent, just add smile.io code from GitHub repository into some local classes if anyone else has the same issue.
Haifeng Li
@haifengl
@dmkaner smile.io is in its own package (smile-io)
Nino
@weinino
Hi @haifengl
I have a situation:
1.) I've trained a random forest with a DataFrame object (target, feature 1, ... , feature 5).
2.) I would like to use RandomForest::predict with a new sample (production), that I get as double[5] {1,2,3,4,5}.
3.) I generate a Tuple t with schema (feature 1, ..., feature 5) and data [1,2,3,4,5]
4.) When I run predict(t) an array out of bounds exception occures. The problem is, that it tries to access feature 5 with index 5. This was correct in the schema of the dataset but not for the schema of the tuple t.
I would have expected, that in predict, it would access the data for Feature 5 from the tuple t on index 4.
Is there something I'm doing fundamentally wrong or might there be some inconsistencies? I couldn't help myself, since all cases I found in some sort of documentation predict on a tuple coming from the original DataFrame.
Thank you :-)
Nino
@weinino
Two solutions seem possible in my opinion:
1.) Bind the new schema (feature 1, ... , feature 5).I've tried this, but since "response != null" in the RF's formula I get a NPE. So, is there any support for execution samples without targets?
2.) I artificially change my sample to include some dummy labels, s.t. the targets have a value aswell and the schema would correspond again to the training case.
Haifeng Li
@haifengl
#2 should work. option #1 should work too with v2.5.3. Which version are you using?
Nino
@weinino
Ok, thank you!
I'll try #1 first, after updating to v2.5.3. At the moment I'm on v2.3.0
Haifeng Li
@haifengl
On v2.5.3, you don't need to bind the schema manually. Smile handles it automatically.
Nino
@weinino
@haifengl I've updated the version and now all looks fine. Thank you :-)
Lukas Braach
@lukasbraach
Hey @haifengl, my Random Forest model is working (mostly) as expected, thank you! I still have one question: On the first inference with the freshly trained model, smile logs The response variable Classification doesn't exist in the schema [...]. Should I pay attention to this message? How do I get rid of it?