Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 15:35
    Micky774 synchronize #23197
  • 15:35
    Micky774 synchronize #23197
  • 15:22
    Micky774 edited #23197
  • 15:21
    Micky774 synchronize #23470
  • 15:19
    thomasjpfan labeled #23476
  • 15:13
    Micky774 synchronize #23127
  • 15:12
    Micky774 commented #22382
  • 15:11
    Micky774 synchronize #22382
  • 14:44
    Micky774 synchronize #22382
  • 14:34
    Micky774 synchronize #22382
  • 14:33
    Micky774 edited #23210
  • 14:07
    Micky774 synchronize #23210
  • 14:02
    Micky774 synchronize #23470
  • 13:58
    Micky774 synchronize #23470
  • 13:47
    Micky774 synchronize #23471
  • 13:42
    Micky774 synchronize #23471
  • 12:28
    github-actions[bot] unlabeled #22348
  • 12:28
    github-actions[bot] assigned #22348
  • 12:28
    Valentin-Laurent commented #22348
  • 12:22
    BBloggsbott commented #23462
Amanda Dsouza
@amy12xx
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import DistanceMetric as dm
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
clf = KNeighborsClassifier(metric="euclidean")
clf.fit(X,y)
clf = KNeighborsClassifier(metric=dm.get_metric("euclidean"))
clf.fit(X,y)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_classification.py", line 198, in fit
    return self._fit(X, y)
  File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_base.py", line 437, in _fit
    self._check_algorithm_metric()
  File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_base.py", line 374, in _check_algorithm_metric
    raise ValueError(
ValueError: Metric '<sklearn.metrics._dist_metrics.EuclideanDistance object at 0x0000018099BB9780>' not valid. Use sorted(sklearn.neighbors.VALID_METRICS['brute']) to get valid options. Metric can also be a callable function.
Adrin Jalali
@adrinjalali
That's indeed curious @amy12xx , I've opened and issue, and you can follow the discussion there: scikit-learn/scikit-learn#22348
1 reply
linlin
@jajupmochi

Hi, I am using TransformedTargetRegressor with KNeighborsRegressor for precomputed metric, however when I do cross validation with GridSearchCV, an error is raised saying that the dimension of the metric is not correct. The code is like this:

from sklearn.preprocessing import MinMaxScaler
target_scaler = MinMaxScaler()
estimator = Pipeline([
        ('scaler', MinMaxScaler()),
        ('model', TransformedTargetRegressor(
          KNeighborsRegressor(metric='precomputed'),
          transformer=target_scaler
        ))])

clf = GridSearchCV(estimator, param_grid=grid_params,
                       scoring=scoring,
                       cv=cv, return_train_score=True, refit=True,
                       error_score='raise')
clf.fit(D_app, y_app)
...

May I ask what may be the problem? In case it is not supported, is there other ways to corrected scale targets in GridSearchCV (as well as HalvingGridSearchCV, etc.).
Thank you very much!

1 reply
Shubham1450
@Shubham1450
I am getting this error-: Could not find conda environment: sklearn-env.
You can list all discoverable environments with conda info --envs.
Shubham1450
@Shubham1450
E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
ogrisel
@ogrisel:matrix.org
[m]
@Shubham1450: can you please open an issue with this error message and the full output of conda list?
rebelCoder
@RebelCoder:matrix.org
[m]

Hey smart people. I am trying to figure out/understand the warning. Solution I found just tell you to disable to warning. Maybe someone give a hint why I am seen the following warning is this super simple Multiple Regression example?

data_file = pd.read_csv("FuelConsumption.csv")

data_frame = data_file[
    [
        'ENGINESIZE',
        'CYLINDERS',
        'FUELCONSUMPTION_CITY',
        'FUELCONSUMPTION_HWY',
        'FUELCONSUMPTION_COMB',
        'CO2EMISSIONS'
    ]
]

data_set_x = ['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION_COMB']
data_set_y = ['CO2EMISSIONS']

mask = np.random.rand(len(data_frame)) < 0.8
train = data_frame[mask]
test = data_frame[~mask]

lr_regression = linear_model.LinearRegression()
train_x = np.asanyarray(train[data_set_x])
train_y = np.asanyarray(train[data_set_y])
lr_regression.fit(train_x, train_y)

y_hat = lr_regression.predict(test[data_set_x])
test_x = np.asanyarray(test[data_set_x])
test_y = np.asanyarray(test[data_set_y])

The line 26:

y_hat = lr_regression.predict(test[data_set_x])

Produces this warning:

sklearn/base.py:443: UserWarning: X has feature names, but LinearRegression was fitted without feature names
  warnings.warn(
Thomas J. Fan
@thomasjpfan

In your example, lr_regression.fit was called with an ndarray, while lr_regression.predict was called with a DataFrame. To prevent the warning, you can fit with the DataFrame directly:

lr_regression.fit(train[data_set_x], train[data_set_y])

without casting to a ndarray.

rebelCoder
@RebelCoder:matrix.org
[m]

Interesting! Thanks. I tried that, and it still has a warning though. I think I figured it out. I use the np array on all of them now.

lr_regression = linear_model.LinearRegression()
train_x = np.asanyarray(train[data_set_x])
train_y = np.asanyarray(train[data_set_y])
lr_regression.fit(train_x, train_y)

test_x = np.asanyarray(test[data_set_x])
test_y = np.asanyarray(test[data_set_y])
y_hat = lr_regression.predict(test_x)

This works. I am also wondering, (yet to look into it) why np arrays are used if just the data frame can be used, as you have suggested?e

3 replies
-bo
@ctc-er
I am a question with sklearn.PCA. Whether data needs to be standardized? eg. the variance may be used before create the pca object?
image.png
Thank you for answering
Maren Westermann
@marenwestermann
Hi scikit-learn team! I've got a new computer (MacBookPro, chip: Apple M1 Pro) on which I installed the development version of scikit-learn. When I ran pytest I encountered the following:
(sklearn-dev) ➜  scikit-learn git:(main) pytest
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0
rootdir: /Users/maren/Documents/scikit-learn, configfile: setup.cfg, testpaths: sklearn
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collecting ... [1]    54294 killed     pytest
`
Guillaume Lemaitre
@glemaitre
this does not look good :)
Maren Westermann
@marenwestermann
I then tried to check what's going on and found the following. Do you have an idea of what I need to do?
(sklearn-dev) ➜  scikit-learn git:(main) python -vvv -c "import sklearn"
import _frozen_importlib # frozen
import _imp # builtin
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
# installing zipimport hook
import 'time' # <class '_frozen_importlib.BuiltinImporter'>
import 'zipimport' # <class '_frozen_importlib.FrozenImporter'>
# installed zipimport hook
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/__init__.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__init__.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/__init__.cpython-39.pyc'
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/__pycache__/codecs.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/__pycache__/codecs.cpython-39.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x101613be0>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/aliases.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/aliases.cpython-39.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x101643190>
import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x1016139d0>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/utf_8.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/utf_8.cpython-39.pyc'
import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x1016138b0>
import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/latin_1.cpython-39.pyc matches /Users/maren/mambaforg
Guillaume Lemaitre
@glemaitre
which compilers are you using when installing the dev version
Maren Westermann
@marenwestermann
It looks like I can't post the full error message because it's too long. I followed the installation here: https://scikit-learn.org/stable/developers/advanced_installation.html#macos-compilers-from-conda-forge
So I installed compilers and llvm-openmp.
And I used the Miniforge3-MacOSX-arm64 download from here: https://github.com/conda-forge/miniforge#miniforge
Guillaume Lemaitre
@glemaitre
let me try with the last version of comilers on my M1 machine
I assume that you forced installing python 3.9 and not 3.10?
1 reply
ogrisel
@ogrisel:matrix.org
[m]
you can use https://gist.github.com to post the full error log and give a link here
Maren Westermann
@marenwestermann
Guillaume Lemaitre
@glemaitre
I can reproduce
clang and llvm have been updated
Guillaume Lemaitre
@glemaitre
so temporary I think that installing compilers=1.3 should fix the problem. I will give it a try.
Then we need to understand why the new compilers are failing. But I can see that clang and llvm have been updated
uhm it is not the compilers :(
Maren Westermann
@marenwestermann
I just tried using compilers=1.3 but it didn't solve the problem
ogrisel
@ogrisel:matrix.org
[m]
@marenwestermann: can you please open an issue?
Maren Westermann
@marenwestermann
Yes, will do
I'm about to head off to a PyLadies Berlin open source hack night that I'm hosting, so will do it then. This is actually a good example case that I can show. :)
ogrisel
@ogrisel:matrix.org
[m]
nice :)
hopefully you will be able to find a workaround if it proves too complex to fix
Taksh Panchal
@TakshPanchal
Hello all. I'm new to contributing here. Can anybody guide me how should I start to contribute in sklearn!
1 reply
Andrew Knyazev
@lobpcg
I am trying to rejuvenate scikit-learn/scikit-learn#14636 which would then close scikit-learn/scikit-learn#8834 and scikit-learn/scikit-learn#8842. That requires merging scipy/scipy#15391 Could someone please help by reviewing? Even though it's SciPy PR, it is a must to merge for sklearn spectral embedding and clustering that relies on SciPy to construct the graph Laplacian.
linlin
@jajupmochi

@adrinjalali Hi, thanks for your advice. Here is my code:

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.compose import TransformedTargetRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV

target_scaler = MinMaxScaler()
estimator = Pipeline([
        ('scaler', MinMaxScaler()),
        ('model', TransformedTargetRegressor(
          KNeighborsRegressor(metric='precomputed'),
          transformer=target_scaler
        ))])

grid_params = {'model__regressor__n_neighbors': [3, 5, 7]}
scoring = 'accuracy'
clf = GridSearchCV(estimator, param_grid=grid_params,
                       scoring=scoring,
                       cv=5, return_train_score=True, refit=True,
                       error_score='raise')

D_app = np.random.rand(10, 10)
y_app = np.random.rand(10)
clf.fit(D_app, y_app)

The following error is raised:

File "/media/ljia/DATA/research-repo/projects/202110 Redox/codes/Redox/issues/target_scaling.py", line 34, in <module>
    clf.fit(D_app, y_app)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 891, in fit
    self._run_search(evaluate_candidates)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 1392, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 838, in evaluate_candidates
    out = parallel(

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 1043, in __call__
    if self.dispatch_one_batch(iterator):

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)

  File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 211, in __call__
    return self.function(*args, **kwargs)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/compose/_target.py", line 246, in fit
    self.regressor_.fit(X, y_trans, **fit_params)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/neighbors/_regression.py", line 213, in fit
    return self._fit(X, y)

  File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 489, in _fit
    raise ValueError(

ValueError: Precomputed matrix must be square. Input is a 8x10 matrix.

May I ask what may be the problem? In case it is not supported, are there other ways to correctly scale targets in GridSearchCV (as well as HalvingGridSearchCV, etc.)?
Thank you very much!

6 replies
Freddy Boulton
@freddyaboulton
Hello! I hope everyone is well. Just wondering, when is the next sklearn release happening, 1.0.3?
2 replies
Vinayaka Kamath
@craterkamath
Hello Everyone, I have a very specific use case designed around scikit-learn and wanted to see if it was possible to code it up. The overall idea is to train a 1D embedding using a pretrained GLM. For this, I take a pretrained Poisson Regressor model trained using scikit-learn(this is GIVEN and persay has been trained on 50 features), I want to add a new feature to it i.e a random variable X ~ N(0, 1) and retrain the model to get a 51 parameter model. Once this is done, I treat the input X as a parameter, freeze the model and get optimal value for X_i using gradient based approaches. Finally, upon having the optimal set of X_i, I want to retrain end to end using all 51 features. This is the gist of the algorithm that is designed. Any leads towards APIs or whether this is achievable or not would be really helpful. Thanks!
lesshaste
@lesshaste
in sklearn.inspection.permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5, n_jobs=None, random_state=None, sample_weight=None, max_samples=1.0 what is the default for scoring?
A " baseline metric, defined by scoring, is evaluated on a (potentially different) dataset defined by the x" so it has to be something
ogrisel
@ogrisel:matrix.org
[m]
I think it's using estimator.score(X_test, y_test) by default.
So accuracy for classifier and R2 for regressors.
lesshaste
@lesshaste
@ogrisel:matrix.org thank you
I am really confused though...
model = xgbr.fit(X_train, y_train)
print(model.score(X_test, y_test))
r = permutation_importance(model, X_test, y_test, n_repeats=30, random_state=0)
for i in r.importances_mean.argsort()[::-1]:
print(f"{r.importances_mean[i]:.3f}" f" +/- {r.importances_std[i]:.3f}")
that simple code that give feature importance gives me a value over 1.20 for the top one. But how can you have a permutation feature importance higher than 1?
lesshaste
@lesshaste
At least a partial answer is that r2 can be arbitrarily negative
Maren Westermann
@marenwestermann
Hi! I promoted the scikit-learn office hours on PyLadies slack and WiMLDS slack because many people don't seem to be aware of them. I also did a tweet via PyLadies Berlin (https://twitter.com/PyLadiesBer/status/1519981569343164417). I think that the biweekly office hours is a great initiative that is especially helpful for folks belonging to groups which are underrepresented in tech and open source. I hope that the office hours and also spreading the word about them in these communities will help with contributor retention.
Guillaume Lemaitre
@glemaitre
Thanks Maren. Indeed, I assume that it could be motivating to have a closer follow-up on some PR.
Alex Cuof
@AlexBSX_gitlab
Hi everyone, i'm new in this community and i want to apologize in advance for any errors i might make in asking the following question. So, i have to implement a classification task using scikit-multiflow for a big dataset (84 feature x 2,5 milion of exemples), processed like a stream. After many and many attempts my code finally run without warnings or errors but there is a problem: i am using the class Evaluate Prequential and its methods for the classification and, by setting adquate metrics to evaluate the goodness of this classification, i obtain very high values for each metric used. This is "strange" considering the dataset i am working on, reason why i want to generate the confusion matrix in order to understand on wich classes my classification algorithm works better and on wich classes it makes more misclassification. Generating confusion matrix is very easy using scikit-learn, but this method needs to have as input parameter true labels and predicted labels and here is the problem: i cannot isolate from Evaluate Prequential, in particular from the method "evaluate", predicted labels, consequently i have no way to generate the confusion matrix because i have not predicted labels to make a comparison with true labels. For sure there is trick to get around this problem but all of my attempts since two days failed and i have no more ideas on how i could do it. Please, do you have an idea on how to solve this problem? Thank you a lot.