Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 08:54
    JulienB-78 synchronize #20638
  • 08:47
    chkoar commented #19903
  • 07:58
    JulienB-78 synchronize #20638
  • 07:57
    JulienB-78 synchronize #20638
  • 07:57
    JulienB-78 synchronize #20638
  • 07:57
    JulienB-78 synchronize #20638
  • 07:56
    JulienB-78 synchronize #20638
  • 07:43
    hongshaoyang commented #20553
  • 07:36
    hongshaoyang commented #20687
  • 07:36
    hongshaoyang commented #20687
  • 07:20
    zeileis commented #20953
  • 07:11
    dayyass edited #21150
  • 03:59
    lobpcg commented #21147
  • Sep 25 23:22
    lobpcg synchronize #21148
  • Sep 25 23:08
    lobpcg commented #21148
  • Sep 25 23:04
    lucyleeow commented #21143
  • Sep 25 21:32
    lobpcg commented #21148
  • Sep 25 21:20
    lobpcg synchronize #21148
  • Sep 25 20:52
    dayyass synchronize #21150
  • Sep 25 20:45
    github-actions[bot] labeled #21150
Felipe Fronchetti
@fronchetti
Hi folks, I am a master's student in CS and I have a question for you. I am working on a multi-class text classification problem, and I am using scikit-learn to implement my solution. I want to predict for a paragraph x if x belongs to one out of seven categories of information. I already implemented my solution using your library, but I am not confident if the steps I am following are correct or not, or if I am missing something. Could you please take a look at the image below and give your opinion? If this is not the right place for this kind of question, please let me know. Thank you in advance for your contribution! Image
4 replies
stimils2
@stimils2
Hi, I want to start working on the Sci-kit learn bug fixes. Anyone who is already working can I team up with you?
Stanimir Ivanov
@Stanimir-Ivanov
Hi all! We're working on a generic implementation of a discrete time survival model for random forests. Similar to this and this. Basically, the idea is to split on hazard curves which are a bit like the class probabilities of regular classification random forests but then stratified per duration since inception of an observation. We want to use scikit-learn for a base. Is anyone here familiar with the random forest code? Also tips for a good PR are very welcome.
um_duaa
@um_duaa:matrix.org
[m]
hi
الحمدلله
@um_duaa123_twitter
I have only one question, please!!!
lesshaste
@lesshaste
What would people recommend for clustering strings (e.g. english words) of the same length?
lesshaste
@lesshaste
or is this better off at github discuss?
Nicolas Hug
@NicolasHug

It really depends on the kind of data that you have. If you have a corpus of documents LDA would be one way to get cluster/topics https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
You could also try pre-trained embeddings like word2vec and the likes

why do they have to be of the same length?

Ariel Silvio Norberto RAMOS
@asnramos
Hello ... Greetings to all..!!! I will participate in the Sprint on Saturday June 26..!!!
Nicolas Hug
@NicolasHug
Welcome @asnramos !
José Chacón
@jchaconm
Hello all . I'm also participating in the sprint next saturday,i'm excited to be able to help checking and fixing an issue!
Temiloluwa Awoyele
@temmyzeus
Hello, how can I join the sprint?
Aditya Acharya
@acharya_aditya_mi_gitlab
@um_duaa123_twitter sure ask
Temiloluwa Awoyele
@temmyzeus
Thanks
Harsh Kumar
@HarshVardhanKumar
why isn't the website working?
Adrin Jalali
@adrinjalali
works for me
Harsh Kumar
@HarshVardhanKumar
Now it also works for me. Had tried with two different networks yesterday... didn't work that time..
anyway, I wanted to ask what version of LAPACK (libblas.so) does sklearn use (assuming it uses it. If not, what blas library is used)?
3 replies
lesshaste
@lesshaste
If I have a neural network classifier I can easily simulate data from the probability distribution implied by the classifier. Can this be done for any of the classifiers in scikit learn?
Harsh Kumar
@HarshVardhanKumar
scikit-learn custom compilation: Is it possible to pass custom gcc flags during scikit-learn build as described here https://scikit-learn.org/stable/developers/advanced_installation.html
the documentation uses pip install. But since the pyx files get compiled in a C file first before finally compiled into a SO file, I wondered if it was possible to pass custom gcc flags in the intermediate stage
Harsh Kumar
@HarshVardhanKumar
Is there any option to build scikit-learn with DEBUG symbols?
Roman Yurchak
@rthy:matrix.org
[m]
Yes, you can pass CFLAGS env variable https://stackoverflow.com/a/10867041/1791279
Harsh Kumar
@HarshVardhanKumar
thanks @rthy:matrix.org
Reshama Shaikh
@reshamas

Hello, I am wondering why this PR (scikit-learn/scikit-learn#18758) doesn't show up at the top here:
https://github.com/scikit-learn/scikit-learn/pulls

Is it because I had submitted it a long time ago, but my recent changes are considered updates?

Guillaume Lemaitre
@glemaitre
I think that PRs are ordered by PR numbers by default in github
Reshama Shaikh
@reshamas
Oh, wow, that's interesting. Not what I would have expected.
Guillaume Lemaitre
@glemaitre
However, you have the option of "Sort by recently updated". This would actually be a good default while reviewing :)
Reshama Shaikh
@reshamas
Yes, that's what I was looking for (and expecting as a default). Thanks!
Roman Yurchak
@rthy:matrix.org
[m]
The "Refined GitHub" browser extension makes it the default among other improvements.
Harsh Kumar
@HarshVardhanKumar
I see that the binary_tree.pxi passes the value num_samples in the _recursive_build() procedure as an argument. The num_samples is calculated using self.data_arr.shape[0] whereas _recursive_build() expects an ITYPE_t argument. ITYPE_t is defined as np.intp_t which I assume is only 32 bit signed integer. So how is the binary tree built in cases when n_samples is greater than this value - let's say 10million data points?
In default python, this would've been handled by increasing the data size to long long implicitly. Does cython take care of it? I don't see any methods to take care of such scenario in the scikit-learn implementation code.
Harsh Kumar
@HarshVardhanKumar
Except for that idx_end - idx_start < 2 will be true in this case (due to signed integer overflow?) and the node 0 will be made a leaf node. But this is an unexpected behaviour, right?
Roman Yurchak
@rthy:matrix.org
[m]
@HarshVardhanKumar: maximum value for int32 is ~2e9 not 2e6. So probably no one has tried using it with more than 2 billion samples. Not sure it's really an issue for the near future.
+1 to check for that overflow though.
Harsh Kumar
@HarshVardhanKumar
@rthy:matrix.org thanks for pointing it out. A silly mistake on my part.
Reshama Shaikh
@reshamas
Hello, I ran pytest sklearn and see the following. Is this ok, or is there something wrong with my build:
SKIPPED [16] sklearn/utils/tests/test_validation.py:1374: could not import 'pandas': No module named 'pandas'
==== 355 failed, 19625 passed, 1443 skipped, 117 xfailed, 37 xpassed, 3371 warnings in 2380.84s (0:39:40) ====
(sklearn-dev)
Reshama Shaikh
@reshamas
OK, it works now. Thanks Thomas Fan.
Yashasvi Misra
@yashasvimisra2798
Hello, I was working on this issue scikit-learn/scikit-learn#20435
However I was not able to find the file to contribute the documentation into, can someone please help me with that
Adrin Jalali
@adrinjalali
@yashasvimisra2798 the documentation is generated from the docstrings in the .py files where those classes are implemented. You should look for classes which inherit that class, and find the relevant part of the docstring there.
Yashasvi Misra
@yashasvimisra2798
Thanks, @adrinjalali I will look into it.
bmoroz82
@bmoroz82:matrix.org
[m]
Does Scikit support regression parameters with multidimensional data structure, e.g., 3-dimensional point data? I would like to perform a regression to predict the position of a 3-dimensional point (x,y,z) using other known 3-dimensional points while weighting by inverse-distance. I have a small sample of dependents Y comprised (xi, yi, zi) and a complete set of independents X1, X2, X3, ... each comprised of (xi, yi, zi). I would like to test a simple model Y = X1 + X2 + ...
Guillaume Lemaitre
@glemaitre
bmoroz82
@bmoroz82:matrix.org
[m]
🙏 appreciated
tejaswivg
@tejaswivg
Can I use a confusion matrix to see the accuracy of SVR(support vector regression) ?
or is it only for classification ?
Guillaume Lemaitre
@glemaitre
confusion matrix and derived metrics are only for classification
look at the regression metrics instead