Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 24 13:51
    haifengl closed #731
  • Jan 24 13:51
    haifengl commented #731
  • Jan 23 10:03
    eugene-kamenev closed #732
  • Jan 23 10:03
    eugene-kamenev commented #732
  • Jan 23 05:22
    eugene-kamenev edited #732
  • Jan 23 04:51
    eugene-kamenev opened #732
  • Jan 18 08:49
    Ant-Phillips commented #731
  • Jan 17 15:40
    haifengl commented #731
  • Jan 16 11:28
    Ant-Phillips opened #731
  • Dec 23 2022 15:37

    haifengl on master

    package.sh -> build.sh (compare)

  • Dec 22 2022 14:01
    haifengl closed #730
  • Dec 22 2022 13:52
    LeXing1105 commented #730
  • Dec 21 2022 23:46
    saudet commented #730
  • Dec 21 2022 16:24
    LeXing1105 commented #730
  • Dec 21 2022 14:09
    haifengl commented #730
  • Dec 21 2022 14:01
    LeXing1105 commented #730
  • Dec 20 2022 18:18
    haifengl commented #730
  • Dec 18 2022 15:29
    LeXing1105 commented #730
  • Dec 17 2022 01:17

    haifengl on master

    clean up ICATest (compare)

  • Dec 17 2022 01:11
    haifengl commented #730
Christopher Small
@metasoarous
This message was deleted
Christopher Small
@metasoarous
Hi folks. I've been having trouble getting smile to link to blas. I'm doing a PCA on a ~ 5k x 7.5k matrix and it's taking an insane amount of time (40-50min). I'm using org.clojars.haifengl/smile "2.5.0" and com.github.haifengl/smile-mkl "2.5.0"on Pop!_OS (an Ubuntu fork; version 20.04 LTS on an i9-9880H processor), using @cnuernber 's tech.ml.dataset lib. Others have subsecond times for matrices in this range, so I'm assuming I'm doing something wrong. However, my understanding from reading the smile docs is that smile-mkl should include all of the blas related binaries. I am seeing INFO smile.math.blas.BLAS - smile-mkl module is available.(and similarly for smile.math.blas.LAPACK). Can someone please help me figure out what I'm missing? Thanks!
Haifeng Li
@haifengl
@metasoarous mkl is optional. What's the time if you don't include it?
Christopher Small
@metasoarous
@haifengl Approximately the same (I previously ran with 2.4.0 without mkl, and couldn't seem to get it to speed up with libblas or libopenblas installed via apt-get)
Haifeng Li
@haifengl
@metasoarous in 2.5, openblas is included in the jars. no need to install.
how do you call PCA
Matthew Giannini
@mgiannini_gitlab
@haifengl - is there a pure-java implementation of the new matrix API in 2.5.0? We do not want to have to depend on platform-specific natives in our application (i.e. we don't want to include BLAS or MKL dependencies).Both are pretty big (MKL is huge >120MB) and are not suitable for deployment on embedded devices. Prior to 2.5.0 we were able to deploy ML tools rather nicely and compactly using only SMILE jars
Haifeng Li
@haifengl
@mgiannini_gitlab no JMatrix in 2.5.0. If you don't use algorithms with matrix operations, you don't need to include dependency jars
5 replies
Christopher Small
@metasoarous
@haifengl Thanks for confirming that openblas is included in the jars. I'm now becoming suspicious that the performance issues are upstream of smile. The INFO smile.math.blas.BLAS - smile-mkl module is available. messages are not getting printed out until the very end of the computations. Am I correct in assuming that these messages should get printed out more or less as soon as the SVD starts? If so, that would suggest the issues are indeed upstream. Thanks again!
Haifeng Li
@haifengl
@metasoarous yes, BLAS is loaded when it is needed.
Christopher Small
@metasoarous
OK; Thanks for confirming @haifengl! And for all your work on SMILE!
Was blown away when I realized you had implemented UMAP :-)
Question though: In the future might it be possible to support custom nearest-neighbor graphs, as it is with the python implementation (via igraph)?
Reason is that I'm dealing with somewhat skewed data in terms of sparseness (some rows are relatively dense, while others have only a handful of results), and a custom distance metric has proved very helpful for teasing apart the relationships in these datasets.
If that's something you'd consider, I'd be happy to file an issue for tracking.
Again, thanks for all of your work on this :-)
Haifeng Li
@haifengl
we already support customized distance
Christopher Small
@metasoarous
Oh! Wonderful! I didn't see that in the options. Must have missed it.
OK; I actually discovered UMAP by looking at the code. Now that I'm looking at the docs, it's clearer: https://haifengl.github.io/api/java/smile/manifold/UMAP.html
I see that you can specify both the adjacency graph on initialization as well as the actual data and distance to the of method.
Christopher Small
@metasoarous
This seems a little odd to me as in my understanding (and at least in the python implementation) the distance metric is only (primarily?) used for computing the nearest neighbors graph. But in smile, the NN graph is passed in directly in the UMAP initialization. Please let me know if I'm misunderstanding something. Thanks!
Christopher Small
@metasoarous
Oh... is the distance parameter of of needed for the projection?
Haifeng Li
@haifengl
The constructor doesn't run the algorithm. It is only for builder methods to create the result object.
customized distance is to create nearest neighbor graph
Christopher Small
@metasoarous
OK; Then if you pass distance to of, why is still necessary to specify an AdjacencyList to the UMAP constructor?
Haifeng Li
@haifengl
you shouldn't call constructor
Christopher Small
@metasoarous
Oh!
Right; Those are static methods
Does that mean its not possible to specify a specific graph, vs KNN?
Haifeng Li
@haifengl
nope
Christopher Small
@metasoarous
OK; Got it.
Thanks for explaining all of that.
It shouldn't be necessary at the moment for what I'm working on, since we're using KNN for our graphs, but would you be open to supporting UMAP on a custom AdjacencyList/graph object?
Happy to throw up an feature request issue for tracking if this is something you'd consider.
Christopher Small
@metasoarous
Separately, any chance that you'd be interested in adding the Leiden clustering algorithm? https://www.nature.com/articles/s41598-019-41695-z? :-)
implisci
@implisci

@haifengl Appears that the native lib jars have changed from 2.4.0 to 2.5.0. Earlier they were using netlib, now its a combination of openblas and arpack. Is there a difference in functionality or performance? I am on Linux. In the Scala REPL, sometimes there are name clashes like

 reference to dot is ambiguous; it is imported twice in the same scope by  import smile.data.formula._   and import smile.math.MathEx.

What are the recommended interfaces for Scala users to leverage the linear algebra and math routines? I did a import smile.math.MathEx.dot in the case above. Thanks.

Christopher Small
@metasoarous
@haifengl I did some more digging on the PCA performance issues and was able to rule out something upstream of smile causing the slowdown. Looking at the implementation, it seems as if the only thing upstream of the SVD itself is the recentering step (and copying of data into a matrix):
        double[] mu = MathEx.colMeans(data);
        Matrix x = new Matrix(data);
        for (int j = 0; j < n; j++) {
            for (int i = 0; i < m; i++) {
                x.sub(i, j, mu[j]);
            }
        }
Please let me know if anything comes to mind about why this might be so slow. Thanks!
Haifeng Li
@haifengl
@implisci scala has a DSL for matrix computation in smile.math.matrix. check out online document for usage.
Kamil Kloch
@kamilkloch
Hello all, what happened to smile-netlib? Last release is 2.4.0, smile-core 2.5.0 no longer depends on it.
Haifeng Li
@haifengl
@kamilkloch we don't need smile-netlib any more. OpenBLAS/MKL is in use from 2.5.0
jansiroky
@jansiroky
Hello, we do face a problem with OLS and SVD convergence. We call OLS.fit(Formula.lhs("y"), data, "svd", false, false) and for some data sets it ends up with error "no convergence in 30 iterations". We are using SMIL 2.4.0. Is there a way how to avoid this convergence problem?
Haifeng Li
@haifengl
@jansiroky stack trace?
Kamil Kloch
@kamilkloch

@kamilkloch we don't need smile-netlib any more. OpenBLAS/MKL is in use from 2.5.0

@haifengl Thanks, how do I now check if native BLAS/LAPACK/ARPACK libraries are loaded? With 2.2.xI would do [BLAS/LAPACK/ARPACK].getInstance().getClass.getName contains "NativeSystem". In 2.5.0 [BLAS/LAPACK].getInstance().getClass.getName returns smile.math.blas.openblas.OpenBLAS and ARPACK.getInstance() does not exist...

Haifeng Li
@haifengl
The native library is built in. you don't need to check it
Kamil Kloch
@kamilkloch

The native library is built in. you don't need to check it

Hm, I am trying to run

BLAS.getInstance().gemm(...)

and get

java.lang.NoClassDefFoundError: org/bytedeco/openblas/global/openblas
project dependencies:
```
libraryDependencies ++= Seq(
"com.github.haifengl" %% "smile-scala" % "2.5.1",
"org.bytedeco" % "javacpp" % "1.5.3" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "openblas" % "0.3.9-1.5.3" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "arpack-ng" % "3.7.0-1.5.3" classifier "windows-x86_64" classifier "linux-x86_64"
)
Kamil Kloch
@kamilkloch
Also, adding smile-mkl dependency end up with
[error] (update) sbt.librarymanagement.ResolveException: download failed: org.bytedeco#mkl;2020.1-1.5.3!mkl.jar
*ends up
Haifeng Li
@haifengl
You miss classifier "" for arpack-ng