thomasjpfan on main
DOC Fix incorrect heading under… (compare)
ogrisel on main
MAINT Do not compute distances … (compare)
Hi. I am trying to develop my own Estimator based on TransformerMixin and BaseEstimator. To make sure I am doing things right I have added a test to my project :
import MyEstimator from sklearn.utils.estimator_checks import check_estimator def test () : me = MyEstimator(**params) check_estimator(me)
If I run the test, I get the following error message :
AssertionError: The error message should contain one of the following patterns: 0 feature\(s\) \(shape=\(\d*, 0\)\) while a minimum of \d* is required.
I don't understand how I am supposed to take care of that. I am even more surprise because my fit_transorm method uses self._validate_data at the beginning. I would expect that function to take care of case like these. Could someone help me with that issue ?
[FEATURE REQUEST] Add GitHub Organisation README profile
Just found out this new GitHub feature on GitHub org.
Is cohen kappa score and balanced accuracy score supposed to work w/ multiclass labels?
I have a 3-class classification and I'm trying to use
cross_validate, but it returns nans for all my scores. I tested the problem by running
cross_val_score on all scores individually and isolated it to those 2 metrics.
X = (100, 5)
y = (100, 3)
clf is a Random Forest Classifier
from sklearn.model_selection import cross_val_score cross_val_score(clf, X, y, cv=5, scoring='balanced_accuracy')
@razou could you please paste a fully reproducible piece of code?
from sklearn.calibration import CalibratedClassifierCV from sklearn.multioutput import ClassifierChain from lightgbm import LGBMClassifier base_estimator = LGBMClassifier() calibrator = CalibratedClassifierCV(base_estimator=base_estimator) clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20) clf.fit(X=train_x, Y=train_y) y_pred_proba = clf.predict_proba(validation_x)
train_ywhich is probably the core of the problem. Using minimal random data from np.random.normal(size=(n_samples, n_features) or np.random.randint(low=0, high=10, size=n_samples)
Thanks you guys for your answers
pip install lightgbm==3.2.1 pip install scikit-learn==0.22.2.post1
from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import MultiLabelBinarizer from sklearn.calibration import CalibratedClassifierCV from sklearn.multioutput import ClassifierChain from lightgbm import LGBMClassifier X, y = make_multilabel_classification(n_samples=2000, n_classes=10, n_labels=2, allow_unlabeled=True) train_x, validation_x, train_y, validation_y = train_test_split(X, y, test_size=0.25) mlb = MultiLabelBinarizer() train_y_encoded = mlb.fit_transform(train_y) validation_y_encoded = mlb.transform(validation_y) base_estimator = LGBMClassifier() calibrator = CalibratedClassifierCV(base_estimator=base_estimator) clf = ClassifierChain(base_estimator=calibrator, order='random', random_state=20) clf.fit(X=train_x, Y=train_y_encoded) y_pred_proba = clf.predict_proba(validation_x) print(y_pred_proba[:3])
I don't understand why you are using
MultiLabelBinarizer here because
y is already a binary representation of the target variable since in this snippet you used
make_multilabel_classification. Please provide a snippet that causes the same error message as the problem you observe with cross-validation cohen kappa score.
Anyways by reading the scikit-learn documentation https://scikit-learn.org/stable/modules/model_evaluation.html#cohen-s-kappa I don't see how this would work for binary encoded multilabeled data.
>>> from sklearn.metrics import cohen_kappa_score >>> cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]]) Traceback (most recent call last): File "<ipython-input-19-2a87559cbf88>", line 1, in <module> cohen_kappa_score([[0, 1], [1, 1]], [[0, 0], [1, 0]]) File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 639, in cohen_kappa_score confusion = confusion_matrix(y1, y2, labels=labels, sample_weight=sample_weight) File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/_classification.py", line 304, in confusion_matrix raise ValueError("%s is not supported" % y_type) ValueError: multilabel-indicator is not supported
clf.fit(X=train_x, Y=train_y)instead of
@glemaitre Solved the problem by turning class 0 into 1.
But it's still not clear what kind of data I get by setting label=0.
Updated the code and added two videos with label=0 and label=1.
I put the code and videos here
It is quite possible that I am difficult to understand, since English is not my native language. There is no opportunity to practice in English.