Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 05 16:38

    haifengl on master

    decouple SHAP and TreeSHAP clas… move SHAP/TreeSHAP to feature.i… (compare)

  • Nov 22 07:43
    a-waider closed #729
  • Nov 22 07:43
    a-waider commented #729
  • Nov 21 22:44
    haifengl closed #728
  • Nov 21 22:43
    haifengl commented #729
  • Nov 21 10:32
    a-waider opened #729
  • Nov 20 00:26

    haifengl on master

    update sbt (compare)

  • Nov 20 00:06

    haifengl on master

    update test golden number core/src/main/java/smile/featur… add feature importance package-… (compare)

  • Nov 19 23:56

    haifengl on master

    update dependency add opens to run arrow (compare)

  • Nov 04 14:50
    haifengl commented #728
  • Nov 04 14:49

    haifengl on master

    ceter y before calling LASSO (compare)

  • Nov 03 18:04
    haifengl closed #727
  • Nov 03 18:04
    haifengl closed #726
  • Oct 29 17:57
    buntec opened #728
  • Oct 29 02:26
    haifengl commented #727
  • Oct 27 04:57
    yjhan99 commented #727
  • Oct 26 14:37
    haifengl commented #727
  • Oct 26 09:53
    yjhan99 edited #727
  • Oct 26 09:41
    yjhan99 opened #727
  • Oct 17 19:36
    haifengl commented #726
Christopher Small
@metasoarous
Again, thanks for all of your work on this :-)
Haifeng Li
@haifengl
we already support customized distance
Christopher Small
@metasoarous
Oh! Wonderful! I didn't see that in the options. Must have missed it.
OK; I actually discovered UMAP by looking at the code. Now that I'm looking at the docs, it's clearer: https://haifengl.github.io/api/java/smile/manifold/UMAP.html
I see that you can specify both the adjacency graph on initialization as well as the actual data and distance to the of method.
Christopher Small
@metasoarous
This seems a little odd to me as in my understanding (and at least in the python implementation) the distance metric is only (primarily?) used for computing the nearest neighbors graph. But in smile, the NN graph is passed in directly in the UMAP initialization. Please let me know if I'm misunderstanding something. Thanks!
Christopher Small
@metasoarous
Oh... is the distance parameter of of needed for the projection?
Haifeng Li
@haifengl
The constructor doesn't run the algorithm. It is only for builder methods to create the result object.
customized distance is to create nearest neighbor graph
Christopher Small
@metasoarous
OK; Then if you pass distance to of, why is still necessary to specify an AdjacencyList to the UMAP constructor?
Haifeng Li
@haifengl
you shouldn't call constructor
Christopher Small
@metasoarous
Oh!
Right; Those are static methods
Does that mean its not possible to specify a specific graph, vs KNN?
Haifeng Li
@haifengl
nope
Christopher Small
@metasoarous
OK; Got it.
Thanks for explaining all of that.
It shouldn't be necessary at the moment for what I'm working on, since we're using KNN for our graphs, but would you be open to supporting UMAP on a custom AdjacencyList/graph object?
Happy to throw up an feature request issue for tracking if this is something you'd consider.
Christopher Small
@metasoarous
Separately, any chance that you'd be interested in adding the Leiden clustering algorithm? https://www.nature.com/articles/s41598-019-41695-z? :-)
implisci
@implisci

@haifengl Appears that the native lib jars have changed from 2.4.0 to 2.5.0. Earlier they were using netlib, now its a combination of openblas and arpack. Is there a difference in functionality or performance? I am on Linux. In the Scala REPL, sometimes there are name clashes like

 reference to dot is ambiguous; it is imported twice in the same scope by  import smile.data.formula._   and import smile.math.MathEx.

What are the recommended interfaces for Scala users to leverage the linear algebra and math routines? I did a import smile.math.MathEx.dot in the case above. Thanks.

Christopher Small
@metasoarous
@haifengl I did some more digging on the PCA performance issues and was able to rule out something upstream of smile causing the slowdown. Looking at the implementation, it seems as if the only thing upstream of the SVD itself is the recentering step (and copying of data into a matrix):
        double[] mu = MathEx.colMeans(data);
        Matrix x = new Matrix(data);
        for (int j = 0; j < n; j++) {
            for (int i = 0; i < m; i++) {
                x.sub(i, j, mu[j]);
            }
        }
Please let me know if anything comes to mind about why this might be so slow. Thanks!
Haifeng Li
@haifengl
@implisci scala has a DSL for matrix computation in smile.math.matrix. check out online document for usage.
Kamil Kloch
@kamilkloch
Hello all, what happened to smile-netlib? Last release is 2.4.0, smile-core 2.5.0 no longer depends on it.
Haifeng Li
@haifengl
@kamilkloch we don't need smile-netlib any more. OpenBLAS/MKL is in use from 2.5.0
jansiroky
@jansiroky
Hello, we do face a problem with OLS and SVD convergence. We call OLS.fit(Formula.lhs("y"), data, "svd", false, false) and for some data sets it ends up with error "no convergence in 30 iterations". We are using SMIL 2.4.0. Is there a way how to avoid this convergence problem?
Haifeng Li
@haifengl
@jansiroky stack trace?
Kamil Kloch
@kamilkloch

@kamilkloch we don't need smile-netlib any more. OpenBLAS/MKL is in use from 2.5.0

@haifengl Thanks, how do I now check if native BLAS/LAPACK/ARPACK libraries are loaded? With 2.2.xI would do [BLAS/LAPACK/ARPACK].getInstance().getClass.getName contains "NativeSystem". In 2.5.0 [BLAS/LAPACK].getInstance().getClass.getName returns smile.math.blas.openblas.OpenBLAS and ARPACK.getInstance() does not exist...

Haifeng Li
@haifengl
The native library is built in. you don't need to check it
Kamil Kloch
@kamilkloch

The native library is built in. you don't need to check it

Hm, I am trying to run

BLAS.getInstance().gemm(...)

and get

java.lang.NoClassDefFoundError: org/bytedeco/openblas/global/openblas
project dependencies:
```
libraryDependencies ++= Seq(
"com.github.haifengl" %% "smile-scala" % "2.5.1",
"org.bytedeco" % "javacpp" % "1.5.3" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "openblas" % "0.3.9-1.5.3" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "arpack-ng" % "3.7.0-1.5.3" classifier "windows-x86_64" classifier "linux-x86_64"
)
Kamil Kloch
@kamilkloch
Also, adding smile-mkl dependency end up with
[error] (update) sbt.librarymanagement.ResolveException: download failed: org.bytedeco#mkl;2020.1-1.5.3!mkl.jar
*ends up
Haifeng Li
@haifengl
You miss classifier "" for arpack-ng
"org.bytedeco" % "arpack-ng" % "3.7.0-1.5.3" classifier "windows-x86_64" classifier "linux-x86_64" classifier ""
Haifeng Li
@haifengl
BTW, why do you call BLAS.getInstance().gemm(...) directly? It is better to use Matrix class.
Kamil Kloch
@kamilkloch
You miss classifier "" for arpack-ng
It did not help, same error
java.lang.NoClassDefFoundError: org/bytedeco/openblas/global/openblas

BTW, why do you call BLAS.getInstance().gemm(...) directly? It is better to use Matrix class.

Many changes have changed from 2.2.x to 2.5.x, to be honest I am struggling a lot to update the code. If there is any sample/doc/changelog, let me know, that would be very helpful.

Kamil Kloch
@kamilkloch
For example, smile.plot.swing.ScatterPlot.plot is gone and ScatterPlot.of does not accept and array of colors.
Haifeng Li
@haifengl
What SBT version are you using?
ScatterPlot.of automatically choose color if you pass an array y.
Javadoc is always updated. check it out
Kamil Kloch
@kamilkloch
@haifengl I just found a pretty nasty behaviour in sbt... sbt/sbt#5775
Kamil Kloch
@kamilkloch

@haifengl perhaps you could give a quick glance how to migrate the following piece of smile 2.2.x code into 2.5.x...

val canvas: PlotCanvas = smile.plot.swing.plot(data: Array[Array[Double]], label: Array[Int], legend: Array[Char], palette: Array[Color])

canvas.setAxisLabels(...)
canvas.setTitle(...)
PlotWindow.show("Clusters", canvas)

In 2.5.1

  • there is no corresponding signature of plot
  • the return value changed from PlotCanvasto Canvaswhich is no longer a JPanel
    Thanks!
Haifeng Li
@haifengl
@kamilkloch Have you checked the project website? All examples were updated with 2.5. For example
    val iris = read.arff("data/weka/iris.arff")
    val canvas = plot(iris, "sepallength", "sepalwidth", "class", '*')
    canvas.setAxisLabels("sepallength", "sepalwidth")
    show(canvas)
Kamil Kloch
@kamilkloch
@haifengl Thanks for the example. I still cannot solve my problem, though... I do now want to use smile.plot.show but rather assemble the swing components myself. In 2.2.x it was possible with PlotCanvas, in 2.5.x panel.add(canvas) no longer works. How do I make it work? Thanks!
Haifeng Li
@haifengl
new PlotCanvas(canvas)
Kamil Kloch
@kamilkloch

new PlotCanvas(canvas)

I cannot find PlotCanvas 2.5.1