Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 14:32
    ogrisel commented #23314
  • 14:27
    ogrisel commented #23314
  • 14:27
    ogrisel commented #23314
  • 14:18
    ogrisel commented #23314
  • 14:15
    simonandras commented #23101
  • 13:43
    github-actions[bot] labeled #23439
  • 13:43
    glemaitre opened #23439
  • 12:31
    RobertHGit synchronize #23431
  • 12:02
    simonandras synchronize #23101
  • 11:50
    simonandras synchronize #23101
  • 06:22
    peter-jansson edited #23438
  • 06:21
    peter-jansson opened #23438
  • 03:06
    thomasjpfan unlabeled #23390
  • 03:06
    thomasjpfan labeled #23390
  • 03:06
    thomasjpfan commented #23390
  • 02:46
    thomasjpfan edited #22506
  • 02:45
    jtracy3 commented #23437
  • 02:41
    thomasjpfan commented #23437
  • 02:40
    thomasjpfan commented #23437
  • 02:40
    thomasjpfan commented #23437
Guillaume Lemaitre
@glemaitre
in PRs we only generate documentation pages where there is a modification
Andrew Knyazev
@lobpcg
@glemaitre great, found it - thanks!
Pavel Yakovlev
@Pahandrovich
Morning all
I want to know if scikit-learn 1.1 will be released in late 2021?
Adrin Jalali
@adrinjalali
There will be minor releases this year (1.0.1 for instance), but the next major release will be next year.
1 reply
Olivier Grisel
@ogrisel
Hello everyone, we are having a live community office hour on discord. Feel free to join to discuss your PRs!
Julien Jerphanion
@jjerphan
:boom: :+1:
Kevin
@kslader8
has anyone tried to implement GAM's via sklearn pipeline's before?
razou
@razou
Hello
I'm using scikit-learn 0.22.2.post1 and getting the following error
AttributeError: '_CalibratedClassifier' object has no attribute 'classes_'when I try to use predict_probaon calibrated classifier
Do you you know if this issue is related to the scikit-learn's version ?
Thanks
Noah Wöhler
@NoahWoehler_twitter
Hi, can I post a call for participants in an interview study on open source projects here? If any mod wants more details via DM first, then I'm happy to oblige :)
Adam Li
@adam2392

Is cohen kappa score and balanced accuracy score supposed to work w/ multiclass labels?

I have a 3-class classification and I'm trying to use cross_validate, but it returns nans for all my scores. I tested the problem by running cross_val_score on all scores individually and isolated it to those 2 metrics.

X = (100, 5)
y = (100, 3)
clf is a Random Forest Classifier

from sklearn.model_selection import cross_val_score

cross_val_score(clf, X, y, cv=5, scoring='balanced_accuracy')
Adrin Jalali
@adrinjalali
@razou could you please paste a fully reproducible piece of code?
@NoahWoehler_twitter we get quite a bit of these requests these days (which is a good thing, shows people are looking into issues). But it would help people decide if they want to spend time on it, if you give a tiny bit of intro on what it is. Also, feel free to send an email to the mailing list with that information if you want to reach more people.
Noah Wöhler
@NoahWoehler_twitter
Sure, I wasn't sure whether this falls under advertising. We are looking for open source contributors who are willing to talk to us about how security and trust are handled within their projects' communities. This is the landing page with more info: https://research.teamusec.de/2021-interviews-oss/
Adrin Jalali
@adrinjalali
ah interesting. I don't think we do much of that in this project, but others may think differently.
razou
@razou
@razou could you please paste a fully reproducible piece of code?
from sklearn.calibration import CalibratedClassifierCV
from sklearn.multioutput import ClassifierChain
from lightgbm import LGBMClassifier

base_estimator = LGBMClassifier()
calibrator = CalibratedClassifierCV(base_estimator=base_estimator)
clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20)
clf.fit(X=train_x, Y=train_y)

y_pred_proba = clf.predict_proba(validation_x)
The aim was to perform multi-label classifier and retourn probability scores for each (label, profile) pair.
NB: y was encoded wit h MultiLabelBinarizer
Olivier Grisel
@ogrisel
@razou please provide a minimal reproducible piece of code, that is a piece a piece of code that we can just copy and paste in a python shell or python script and run to trigger the problem. Here the code you provide does not include the definition of train_x and train_y which is probably the core of the problem. Using minimal random data from np.random.normal(size=(n_samples, n_features) or np.random.randint(low=0, high=10, size=n_samples)
and also add the necessary code to preprocess train_y and the code that computes the cross validation with the score you want.
Minimal stands for removing anything that is not necessary. For instance are CalibratedClassifierCV ClassifierChain necessary to reproduce the problem? Or can you just reproduce the problem by cross validating the base estatimtor directly? If so simplify the code snippet.
razou
@razou

Thanks you guys for your answers

  1. libraries

    pip install lightgbm==3.2.1
    pip install scikit-learn==0.22.2.post1
  2. Code snipet

from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.calibration import CalibratedClassifierCV
from sklearn.multioutput import ClassifierChain

from lightgbm import LGBMClassifier

X, y = make_multilabel_classification(n_samples=2000, n_classes=10, n_labels=2, allow_unlabeled=True)
train_x, validation_x, train_y, validation_y = train_test_split(X, y, test_size=0.25)

mlb = MultiLabelBinarizer()
train_y_encoded = mlb.fit_transform(train_y)
validation_y_encoded = mlb.transform(validation_y)

base_estimator = LGBMClassifier()
calibrator = CalibratedClassifierCV(base_estimator=base_estimator)
clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20)
clf.fit(X=train_x, Y=train_y_encoded)

y_pred_proba = clf.predict_proba(validation_x)
print(y_pred_proba[:3])
Olivier Grisel
@ogrisel

I don't understand why you are using MultiLabelBinarizer here because y is already a binary representation of the target variable since in this snippet you used make_multilabel_classification. Please provide a snippet that causes the same error message as the problem you observe with cross-validation cohen kappa score.

Anyways by reading the scikit-learn documentation https://scikit-learn.org/stable/modules/model_evaluation.html#cohen-s-kappa I don't see how this would work for binary encoded multilabeled data.

The scikit-learn error message is actually quite explicit:
>>> from sklearn.metrics import cohen_kappa_score
>>> cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]])
Traceback (most recent call last):
  File "<ipython-input-19-2a87559cbf88>", line 1, in <module>
    cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]])
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 639, in cohen_kappa_score
    confusion = confusion_matrix(y1, y2, labels=labels, sample_weight=sample_weight)
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 304, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
razou
@razou
Where Kappa metric cames from ? I did not used it ...
Olivier Grisel
@ogrisel
@razou sorry I mixed 2 conversations. Ignore the bit on Cohen's Kappa then.
@razou your code snippet works with clf.fit(X=train_x, Y=train_y) instead of clf.fit(X=train_x, Y=train_y_encoded).
razou
@razou
Thanks @ogrisel for answers (y)
quant12345
@quant12345
how to get precision and recall from function 'precision_recall_curve' for class 0. I posted a question with a code on this topic here
Guillaume Lemaitre
@glemaitre
You should use either
with pos_label=0
quant12345
@quant12345

@glemaitre Solved the problem by turning class 0 into 1.
But it's still not clear what kind of data I get by setting label=0.
Updated the code and added two videos with label=0 and label=1.
I put the code and videos here

It is quite possible that I am difficult to understand, since English is not my native language. There is no opportunity to practice in English.

This message was deleted
Guillaume Lemaitre
@glemaitre
turning class 0 to 1 is equivalent to change pos_label=0 without changing the label.
quant12345
@quant12345
Thanks!
Rishabh S.
@anonymousr007

If we run this command after setup pytest maint_tools/test_docstrings.py -k sklearn.utils.extmath.cartesian, we got
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /workspaces/scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 0 items / 2 skipped

=================================================== short test summary info ===================================================
SKIPPED [2] maint_tools/test_docstrings.py:12: could not import 'numpydoc.validate': No module named 'numpydoc'
===================================================== 2 skipped in 0.47s ======================================================

Guillaume Lemaitre
@glemaitre
you need to install numpydoc via pip or conda
Rishabh S.
@anonymousr007
Okay
Guillaume Lemaitre
@glemaitre
otherwise the test is skipped
we don’t impose it because this is an optional dependency
Rishabh S.
@anonymousr007
Working on #21350 issue
Rishabh S.
@anonymousr007

===================================================== test session starts =====================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /workspaces/scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 2110 items / 2109 deselected / 1 selected

maint_tools/test_docstrings.py . [100%]

============================================= 1 passed, 2109 deselected in 0.98s ==============================================

now, we got passed, then we make a PR for it ?
Guillaume Lemaitre
@glemaitre
yes
Rishabh S.
@anonymousr007

yes

Done!

zack
@zacchiro:matrix.org
[m]
hey, i'm unclear on whether this channel is for user- or developer-related questions (or both)? can someone clarify? i don't want to bother others with off-topic questions :-)

hey, i'm unclear on whether this channel is for user- or developer-related questions (or both)? can someone clarify? i don't want to bother others with off-topic questions :-)

oh, i guess the topic answers that (thanks)

zack
@zacchiro:matrix.org
[m]
so, for the actual Q: