Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 23:10
    soma2000-lang commented #22265
  • 23:09
    soma2000-lang commented #22325
  • 22:53
    Micky774 commented #22327
  • 22:52
    Micky774 commented #22327
  • 22:52
    Micky774 synchronize #22327
  • 22:51
    github-actions[bot] labeled #22327
  • 22:51
    Micky774 opened #22327
  • 21:55
    Micky774 synchronize #22236
  • 21:53
    Micky774 synchronize #22236
  • 21:53
    Diwakar-Gupta synchronize #22326
  • 21:46
    Diwakar-Gupta edited #22326
  • 21:46
    github-actions[bot] labeled #22326
  • 21:45
    github-actions[bot] labeled #22326
  • 21:45
    Diwakar-Gupta opened #22326
  • 21:44
    Micky774 commented #22236
  • 21:30
    Diwakar-Gupta labeled #22325
  • 21:30
    Diwakar-Gupta labeled #22325
  • 21:30
    Diwakar-Gupta opened #22325
  • 21:16
    Micky774 commented #22236
  • 21:06
    reshamas commented #22322
Andreas Mueller
@amueller
Hi @KiranHipparagi , thanks for wanting to contribute! Have you read the contributor's guide? I would suggest you look for issues tagged "good first issue". Many of those are multi-part issues where you can pick just a part to work on
this one might be good to get started: scikit-learn/scikit-learn#20308
Aura09c
@Aura09c
Hello i am starting out this one project and some of the parts are really complicated . And i am happy to learn during the complicated parts. But i feel i need more people. If anyone is interested in working together and learning together feel free to dm me
my discord is Aura#5549
Reshama Shaikh
@reshamas

Deleted this message from the /dev/ channel. Copying and pasting here:

I am Bhavya Bhardwaj (https://github.com/Bhavya1705). I am a student of Electronics and Communication at Amrita Vishwa Vidyapeetham, India. My thanks to you and the team for sklearn. I have been try to make some contributions to the scikit-learn library - scikit-learn/scikit-learn#5516. I have made the code, and the necessary changes to the init file and test files, in addition to the _classification file. This is the links to my commits - scikit-learn/scikit-learn#20861, as you will see, there are many mistakes, that I have made, Any help that you can render to me would be much appreciated and would be a wonderful learning experience.
Thank You

Bhavya Bhardwaj
@Bhavya1705
@reshamas Thank You, I have managed to solve the issue.
adriente
@adriente

Hi. I am trying to develop my own Estimator based on TransformerMixin and BaseEstimator. To make sure I am doing things right I have added a test to my project :

import MyEstimator
from sklearn.utils.estimator_checks import check_estimator
def test () : 
    me = MyEstimator(**params)
    check_estimator(me)

If I run the test, I get the following error message :

AssertionError: The error message should contain one of the following patterns:
               0 feature\(s\) \(shape=\(\d*, 0\)\) while a minimum of \d* is required.

I don't understand how I am supposed to take care of that. I am even more surprise because my fit_transorm method uses self._validate_data at the beginning. I would expect that function to take care of case like these. Could someone help me with that issue ?

Roman Yurchak
@rthy:matrix.org
[m]
@adriente: Could you please open a Github issue with full traceback and tag me (@rth) in?
Freddy Boulton
@freddyaboulton
Hello! I opened this feature request a week ago. Just bumping it here in case it got lost: scikit-learn/scikit-learn#20890
Guillaume Lemaitre
@glemaitre
@freddyaboulton I can assure you it is not lost :) I saw it but I did not look at it yet because we are kind of working on releasing 1.0. Once the release done, you might get some attention from core-devs
1 reply
Loki The Great
@makingglitches
Morning all
I was kind of curious . I’ve been dragging my feet using torch and I was wondering does this lib offer anything over torch ? Maybe this is better suited for the low level scientist trying to learn theory ? Or is it just an alternative ?
Omg I’m trapped
Nicolas Hug
@NicolasHug
@makingglitches pytorch and scikit-learn operate at different levels of abstractions but simply put in terms of scope, pytorch is for deep-learning while scikit-learn is for the rest of ML that's not deep learning. So one might be better suited than the other, depending on the theory that you're interested in.
Siddhant Khare
@Siddhant-K-code

[FEATURE REQUEST] Add GitHub Organisation README profile

Just found out this new GitHub feature on GitHub org.

Like this: https://twitter.com/vinzvinci/status/1438033675313025024

Andrew Knyazev
@lobpcg
When a PR generates html doc, where to see it?
Adrin Jalali
@adrinjalali
@lobpcg do you have a PR number?
Andrew Knyazev
@lobpcg
@adrinjalali #21148
Guillaume Lemaitre
@glemaitre
When the documentation CIs are finished
there will be a "ci/circleci: doc artifact — Link to 0/doc/_changed.html " line
You can clicked on "Details"
It will redirect to an HTML page where you will have hyperlinks to each documentation page that has been generated
in PRs we only generate documentation pages where there is a modification
Andrew Knyazev
@lobpcg
@glemaitre great, found it - thanks!
Pavel Yakovlev
@Pahandrovich
Morning all
I want to know if scikit-learn 1.1 will be released in late 2021?
Adrin Jalali
@adrinjalali
There will be minor releases this year (1.0.1 for instance), but the next major release will be next year.
1 reply
Olivier Grisel
@ogrisel
Hello everyone, we are having a live community office hour on discord. Feel free to join to discuss your PRs!
Julien Jerphanion
@jjerphan
:boom: :+1:
Kevin
@kslader8
has anyone tried to implement GAM's via sklearn pipeline's before?
razou
@razou
Hello
I'm using scikit-learn 0.22.2.post1 and getting the following error
AttributeError: '_CalibratedClassifier' object has no attribute 'classes_'when I try to use predict_probaon calibrated classifier
Do you you know if this issue is related to the scikit-learn's version ?
Thanks
Noah Wöhler
@NoahWoehler_twitter
Hi, can I post a call for participants in an interview study on open source projects here? If any mod wants more details via DM first, then I'm happy to oblige :)
Adam Li
@adam2392

Is cohen kappa score and balanced accuracy score supposed to work w/ multiclass labels?

I have a 3-class classification and I'm trying to use cross_validate, but it returns nans for all my scores. I tested the problem by running cross_val_score on all scores individually and isolated it to those 2 metrics.

X = (100, 5)
y = (100, 3)
clf is a Random Forest Classifier

from sklearn.model_selection import cross_val_score

cross_val_score(clf, X, y, cv=5, scoring='balanced_accuracy')
Adrin Jalali
@adrinjalali
@razou could you please paste a fully reproducible piece of code?
@NoahWoehler_twitter we get quite a bit of these requests these days (which is a good thing, shows people are looking into issues). But it would help people decide if they want to spend time on it, if you give a tiny bit of intro on what it is. Also, feel free to send an email to the mailing list with that information if you want to reach more people.
Noah Wöhler
@NoahWoehler_twitter
Sure, I wasn't sure whether this falls under advertising. We are looking for open source contributors who are willing to talk to us about how security and trust are handled within their projects' communities. This is the landing page with more info: https://research.teamusec.de/2021-interviews-oss/
Adrin Jalali
@adrinjalali
ah interesting. I don't think we do much of that in this project, but others may think differently.
razou
@razou
@razou could you please paste a fully reproducible piece of code?
from sklearn.calibration import CalibratedClassifierCV
from sklearn.multioutput import ClassifierChain
from lightgbm import LGBMClassifier

base_estimator = LGBMClassifier()
calibrator = CalibratedClassifierCV(base_estimator=base_estimator)
clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20)
clf.fit(X=train_x, Y=train_y)

y_pred_proba = clf.predict_proba(validation_x)
The aim was to perform multi-label classifier and retourn probability scores for each (label, profile) pair.
NB: y was encoded wit h MultiLabelBinarizer
Olivier Grisel
@ogrisel
@razou please provide a minimal reproducible piece of code, that is a piece a piece of code that we can just copy and paste in a python shell or python script and run to trigger the problem. Here the code you provide does not include the definition of train_x and train_y which is probably the core of the problem. Using minimal random data from np.random.normal(size=(n_samples, n_features) or np.random.randint(low=0, high=10, size=n_samples)
and also add the necessary code to preprocess train_y and the code that computes the cross validation with the score you want.
Minimal stands for removing anything that is not necessary. For instance are CalibratedClassifierCV ClassifierChain necessary to reproduce the problem? Or can you just reproduce the problem by cross validating the base estatimtor directly? If so simplify the code snippet.
razou
@razou

Thanks you guys for your answers

  1. libraries

    pip install lightgbm==3.2.1
    pip install scikit-learn==0.22.2.post1
  2. Code snipet

from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.calibration import CalibratedClassifierCV
from sklearn.multioutput import ClassifierChain

from lightgbm import LGBMClassifier

X, y = make_multilabel_classification(n_samples=2000, n_classes=10, n_labels=2, allow_unlabeled=True)
train_x, validation_x, train_y, validation_y = train_test_split(X, y, test_size=0.25)

mlb = MultiLabelBinarizer()
train_y_encoded = mlb.fit_transform(train_y)
validation_y_encoded = mlb.transform(validation_y)

base_estimator = LGBMClassifier()
calibrator = CalibratedClassifierCV(base_estimator=base_estimator)
clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20)
clf.fit(X=train_x, Y=train_y_encoded)

y_pred_proba = clf.predict_proba(validation_x)
print(y_pred_proba[:3])
Olivier Grisel
@ogrisel

I don't understand why you are using MultiLabelBinarizer here because y is already a binary representation of the target variable since in this snippet you used make_multilabel_classification. Please provide a snippet that causes the same error message as the problem you observe with cross-validation cohen kappa score.

Anyways by reading the scikit-learn documentation https://scikit-learn.org/stable/modules/model_evaluation.html#cohen-s-kappa I don't see how this would work for binary encoded multilabeled data.

The scikit-learn error message is actually quite explicit:
>>> from sklearn.metrics import cohen_kappa_score
>>> cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]])
Traceback (most recent call last):
  File "<ipython-input-19-2a87559cbf88>", line 1, in <module>
    cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]])
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 639, in cohen_kappa_score
    confusion = confusion_matrix(y1, y2, labels=labels, sample_weight=sample_weight)
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 304, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
razou
@razou
Where Kappa metric cames from ? I did not used it ...
Olivier Grisel
@ogrisel
@razou sorry I mixed 2 conversations. Ignore the bit on Cohen's Kappa then.
@razou your code snippet works with clf.fit(X=train_x, Y=train_y) instead of clf.fit(X=train_x, Y=train_y_encoded).