scikit-learn: machine learning in Python. Please feel free to ask specific questions about scikit-learn. Please try to keep the discussion focused on scikit-learn usage and immediately related open source projects from the Python ecosystem.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import DistanceMetric as dm
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
clf = KNeighborsClassifier(metric="euclidean")
clf.fit(X,y)
clf = KNeighborsClassifier(metric=dm.get_metric("euclidean"))
clf.fit(X,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_classification.py", line 198, in fit
return self._fit(X, y)
File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_base.py", line 437, in _fit
self._check_algorithm_metric()
File "C:\Users\Amanda\Miniconda3\envs\mlenv\lib\site-packages\sklearn\neighbors\_base.py", line 374, in _check_algorithm_metric
raise ValueError(
ValueError: Metric '<sklearn.metrics._dist_metrics.EuclideanDistance object at 0x0000018099BB9780>' not valid. Use sorted(sklearn.neighbors.VALID_METRICS['brute']) to get valid options. Metric can also be a callable function.
Hi, I am using TransformedTargetRegressor with KNeighborsRegressor for precomputed metric, however when I do cross validation with GridSearchCV, an error is raised saying that the dimension of the metric is not correct. The code is like this:
from sklearn.preprocessing import MinMaxScaler
target_scaler = MinMaxScaler()
estimator = Pipeline([
('scaler', MinMaxScaler()),
('model', TransformedTargetRegressor(
KNeighborsRegressor(metric='precomputed'),
transformer=target_scaler
))])
clf = GridSearchCV(estimator, param_grid=grid_params,
scoring=scoring,
cv=cv, return_train_score=True, refit=True,
error_score='raise')
clf.fit(D_app, y_app)
...
May I ask what may be the problem? In case it is not supported, is there other ways to corrected scale targets in GridSearchCV (as well as HalvingGridSearchCV, etc.).
Thank you very much!
conda list
?
Hey smart people. I am trying to figure out/understand the warning. Solution I found just tell you to disable to warning. Maybe someone give a hint why I am seen the following warning is this super simple Multiple Regression example?
data_file = pd.read_csv("FuelConsumption.csv")
data_frame = data_file[
[
'ENGINESIZE',
'CYLINDERS',
'FUELCONSUMPTION_CITY',
'FUELCONSUMPTION_HWY',
'FUELCONSUMPTION_COMB',
'CO2EMISSIONS'
]
]
data_set_x = ['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION_COMB']
data_set_y = ['CO2EMISSIONS']
mask = np.random.rand(len(data_frame)) < 0.8
train = data_frame[mask]
test = data_frame[~mask]
lr_regression = linear_model.LinearRegression()
train_x = np.asanyarray(train[data_set_x])
train_y = np.asanyarray(train[data_set_y])
lr_regression.fit(train_x, train_y)
y_hat = lr_regression.predict(test[data_set_x])
test_x = np.asanyarray(test[data_set_x])
test_y = np.asanyarray(test[data_set_y])
The line 26:
y_hat = lr_regression.predict(test[data_set_x])
Produces this warning:
sklearn/base.py:443: UserWarning: X has feature names, but LinearRegression was fitted without feature names
warnings.warn(
Interesting! Thanks. I tried that, and it still has a warning though. I think I figured it out. I use the np array
on all of them now.
lr_regression = linear_model.LinearRegression()
train_x = np.asanyarray(train[data_set_x])
train_y = np.asanyarray(train[data_set_y])
lr_regression.fit(train_x, train_y)
test_x = np.asanyarray(test[data_set_x])
test_y = np.asanyarray(test[data_set_y])
y_hat = lr_regression.predict(test_x)
This works. I am also wondering, (yet to look into it) why np arrays
are used if just the data frame can be used, as you have suggested?e
(sklearn-dev) ➜ scikit-learn git:(main) pytest
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0
rootdir: /Users/maren/Documents/scikit-learn, configfile: setup.cfg, testpaths: sklearn
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collecting ... [1] 54294 killed pytest
`
(sklearn-dev) ➜ scikit-learn git:(main) python -vvv -c "import sklearn"
import _frozen_importlib # frozen
import _imp # builtin
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
# installing zipimport hook
import 'time' # <class '_frozen_importlib.BuiltinImporter'>
import 'zipimport' # <class '_frozen_importlib.FrozenImporter'>
# installed zipimport hook
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/__init__.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__init__.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/__init__.cpython-39.pyc'
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/__pycache__/codecs.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/codecs.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/__pycache__/codecs.cpython-39.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x101613be0>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/aliases.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/aliases.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/aliases.cpython-39.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x101643190>
import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x1016139d0>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/utf_8.cpython-39.pyc matches /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/utf_8.py
# code object from '/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/utf_8.cpython-39.pyc'
import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x1016138b0>
import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.cpython-39-darwin.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.abi3.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.so
# trying /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/latin_1.py
# /Users/maren/mambaforge/envs/sklearn-dev/lib/python3.9/encodings/__pycache__/latin_1.cpython-39.pyc matches /Users/maren/mambaforg
compilers
and llvm-openmp
.
Miniforge3-MacOSX-arm64
download from here: https://github.com/conda-forge/miniforge#miniforge
@adrinjalali Hi, thanks for your advice. Here is my code:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.compose import TransformedTargetRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
target_scaler = MinMaxScaler()
estimator = Pipeline([
('scaler', MinMaxScaler()),
('model', TransformedTargetRegressor(
KNeighborsRegressor(metric='precomputed'),
transformer=target_scaler
))])
grid_params = {'model__regressor__n_neighbors': [3, 5, 7]}
scoring = 'accuracy'
clf = GridSearchCV(estimator, param_grid=grid_params,
scoring=scoring,
cv=5, return_train_score=True, refit=True,
error_score='raise')
D_app = np.random.rand(10, 10)
y_app = np.random.rand(10)
clf.fit(D_app, y_app)
The following error is raised:
File "/media/ljia/DATA/research-repo/projects/202110 Redox/codes/Redox/issues/target_scaling.py", line 34, in <module>
clf.fit(D_app, y_app)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 891, in fit
self._run_search(evaluate_candidates)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 1392, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 838, in evaluate_candidates
out = parallel(
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 1043, in __call__
if self.dispatch_one_batch(iterator):
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/ljia/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 211, in __call__
return self.function(*args, **kwargs)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/pipeline.py", line 394, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/compose/_target.py", line 246, in fit
self.regressor_.fit(X, y_trans, **fit_params)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/neighbors/_regression.py", line 213, in fit
return self._fit(X, y)
File "/home/ljia/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 489, in _fit
raise ValueError(
ValueError: Precomputed matrix must be square. Input is a 8x10 matrix.
May I ask what may be the problem? In case it is not supported, are there other ways to correctly scale targets in GridSearchCV (as well as HalvingGridSearchCV, etc.)?
Thank you very much!