@amueller when I run this code:
train_scores, valid_scores = validation_curve(estimator=pipeline, # estimator (pipeline) X=features, # features matrix y=target, # target vector param_name='pca__n_components', param_range=range(1,50), # test these k-values cv=5, # 5-fold cross-validation scoring='neg_mean_absolute_error') # use negative validation
directly on the host (with 24 cores) I get ~30 seconds. When I run it directly on localhost (4 cores, 8 threads) I get around 30-40 seconds as well. When I run inside docker with cpu limit of 6 cores and 6GB RAM, it needs almost 10 minutes. Inside a VirtualBox with 2 cores.. around 30 seconds, seems scikit does not play well with docker limitations which uses the CFS Scheduler: link
range(1,5)the code runs much faster (I am no data scientist)
validation_curvedoes not really profit from multithreading/multiprocessing. I get almost same results on intel i7 (4 cores) and intel xeon (24 cores). The problem is that if the validation curve runs on the xeon machines.. it uses all cores and the machine is overloaded, which makes no sense, really :)
conda install numpy scipy cython matplotlib pytest flake8 sphinx sphinx-galleryor something like that
mkl(from conda or pip)
should work all OS I think
cd doc make html
_build/htmlfolder and you can search for the
sklearn/metrics/pairwise.py. My question is, are the examples run in the doc building process and output is generated or I am supposed to manually write the output of the example in the docstring of a function?
KDTreeregarding the issue. But I looked into
sklearn/neighbors/kd_tree.pyxand it looks like
KDTreeis inheriting its docstring from
BinaryTree. So can someone tell me an elegant way to append my note docstring to the inherited docstring of
KDTreeor if I could do something else to solve this issue.