scikit-learn: machine learning in Python. Please feel free to ask specific questions about scikit-learn. Please try to keep the discussion focused on scikit-learn usage and immediately related open source projects from the Python ecosystem.
thomasjpfan on main
MAINT Adapt `PairwiseDistancesR… (compare)
Hello guys, maybe anyone can help me out here. I am running following validation code:
train_scores, valid_scores = validation_curve(estimator=pipeline, # estimator (pipeline)
X=features, # features matrix
y=target, # target vector
param_name='pca__n_components',
param_range=range(1,50), # test these k-values
cv=5, # 5-fold cross-validation
scoring='neg_mean_absolute_error') # use negative validation
in the same .py
file on different machines, which I would name #1 localhost
, #2 staging
, #3 live
, #4 live
localhost and staging have both i7 cpus, localhost needs around 40s for the validation, staging needs around 13-14 seconds
live (#3) and live (#4) need almost 10 minutes for executing the validation - both of these servers have intel cpus with 48 threads.
In order to get more "trustworthy" numbers I dockerized the images and run them on the servers. Anyone has an idea why the speed is so different?
from sklearn.linear_model import LinearRegression
model = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_transformer = PolynomialFeatures(degree=2, include_bias=False)
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('poly', poly_transformer), ('reg', model)])
After profiling, I saw this (slowest time on bottom, sorted by 3rd column):
4150 208.706 0.050 208.706 0.050 {built-in method numpy.dot}
245 13.112 0.054 13.360 0.055 decomp_svd.py:16(svd)
2170 142.567 0.066 143.360 0.066 decomp_lu.py:153(lu)
Just executed python -m cProfiler validation.py