## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
• 06:40
Travis WaiChungK/scikit-learn (master) errored (431)
• Oct 18 22:34
steinfurt commented #15214
• Oct 18 22:27
steinfurt commented #15214
• Oct 18 21:53
steinfurt synchronize #15196
• Oct 18 21:42
steinfurt synchronize #15214
• Oct 18 21:31
steinfurt synchronize #15214
• Oct 18 21:22
steinfurt commented #15214
• Oct 18 21:15
jiwidi edited #15289
• Oct 18 20:58
steinfurt synchronize #15196
• Oct 18 20:56
steinfurt synchronize #15214
• Oct 18 20:49
jiwidi edited #15289
• Oct 18 20:18

thomasjpfan on master

MNT make files private in sklea… (compare)

• Oct 18 20:18
thomasjpfan closed #15165
• Oct 18 20:14
jiwidi synchronize #15289
• Oct 18 20:12
jiwidi synchronize #15289
• Oct 18 19:02
• Oct 18 19:01
• Oct 18 19:00
NicolasHug commented #15285
• Oct 18 19:00
NicolasHug commented #15284
• Oct 18 18:44
maartenbreddels reopened #14963
Loïc Estève
@lesteve
@sameshl note this part of the contributing scikit-learn doc: https://scikit-learn.org/stable/developers/contributing.html#documentation
If you see ways the contributing doc can be improved while you face this "setup" issues, let us know or/and open PRs to improve the contributing docs!
Samesh Lakhotia
@sameshl
@lesteve Sure. Thanks for the help.
As a beginner contributor to this organisation, the arrangements of the docs did feel a bit tough to navigate. I will put my thoughts about it more concisely and then open a issue and PR for the same
We're working on improving our contributing docs @sameshl, there's some discussion under #14582
Samesh Lakhotia
@sameshl
Thats great. Would love to contribute on scikit-learn/scikit-learn#14582
I am working on scikit-learn/scikit-learn#14575. So I found the corresponding example under sklearn/metrics/pairwise.py. My question is, are the examples run in the doc building process and output is generated or I am supposed to manually write the output of the example in the docstring of a function?
you should write the output in the example. The doc build will run the code and check if the generated output is the same as the one you put there. See https://docs.python.org/3.5/library/doctest.html for more info
Samesh Lakhotia
@sameshl
Vishesh Mangla
@XtremeGood
Does anyone here knows a good source to learn rnn structure ?
Is it like replacing every hidden node with a rnn cell?
Samesh Lakhotia
@sameshl
I am working on scikit-learn/scikit-learn#14131 . So, I thought that I could append a note in the docstring of KDTree regarding the issue. But I looked into sklearn/neighbors/kd_tree.pyx and it looks like KDTree is inheriting its docstring from BinaryTree. So can someone tell me an elegant way to append my note docstring to the inherited docstring of KDTree or if I could do something else to solve this issue.
Currently working on #14081.
I am supposed to create a pitfalls section which includes practices not to be followed by users. Quite confused about how should I approach it, should I create a whole new section in documentation.html or is there another way to do this??
Thanks for the help!!!
Peng Yu
@yupbank
Hey channel, i’ve being working on vectorizing regression tree with Numpy, and i have achieved some speed up against the cython version of sklearn. in case anyone is interested, here is the link https://github.com/yupbank/np_decision_tree#regression-with-mae
Peng Yu
@yupbank
on median data(10000*100), with MAE criteria, achieved 20 times speed up :)
still haven't checked the code in depth. But it's definitely interesting @yupbank . What do you think @NicolasHug ?
Peng Yu
@yupbank
i haven’t clean the code yet, and also working on a blog post explainning what i did, and add some CI to it. But i would love to have some extra inputs before i proceed, e.g. reviews.
I don't think it'd be easy, but I'd love to see if it actually passes our tree tests, and if it doesn't why not and which tests. Feel free to ping me when you write the blog post.
Peng Yu
@yupbank
sure.. that would be nice,
Nicolas Hug
@NicolasHug
@yupbank pretty cool stuff! I took a quick glance at the tree grower and the greedy_split function and it looks good as far as I can tell. I wouldn't advertise benchmarks with only max_depth=1 though ;)
Please definitely ping us when you write the blog post!!
Peng Yu
@yupbank
lol, you are right, actually with max_depth=10, i only get 5 times faster.
Peng Yu
@yupbank
@NicolasHug @adrinjalali hey.. i have a draft version here.. comments are very welcome :) https://yupbank.github.io/learning/2019/08/08/faster-regression-tree.html
Peng Yu
@yupbank
omg omg omg, For L2 loss, if i replace import numpy as np with import cupy as np, i get another 10x Speed up for 1 split, but i would lost the edge when i have too many depth.. i need to refactor my code…
+1
Peng Yu
@yupbank
but i really like the fact that, switching to GPU is so trivial …
Matthew Bowling
Have a question maybe someone can answer. Trying to use a simple model on a set of data. About a couple thousand rows and only a dozen features, most are binary. I'm training on Logistic Regression, and found my model overfits. So when I try to tune my hyperparameters, my accuracy remains entirely unchanged. Has anyone seen this before or know why this is happening?
Guillaume Lemaitre
@glemaitre
Do you have imbalanced classes?
Samesh Lakhotia
@sameshl
I want to rebuild the 'scikit-learn' project. I tried running pip install --editable . as stated in the docs https://scikit-learn.org/stable/developers/advanced_installation.html#building-from-source but I am getting this error. Can someone help me out.
ERROR: Cannot uninstall 'scikit-learn'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Roman Yurchak
@rth
@sameshl See https://github.com/pypa/pip/issues/5247#issuecomment-381550610 probably best to reinstall in a new virtual environment.
Andreas Mueller
@amueller
@thomasjpfan you like puzzles, right? scikit-learn/scikit-learn#14704
PLT
@Pakigya

I can't seem to get to make the virtual environment with sphinxgallery

conda create -n sklearndev numpy scipy matplotlib pytest sphinx cython ipykernel sphinxgallery

or

conda create -n sklearndev numpy scipy matplotlib pytest sphinx cython ipykernel sphinx-gallery

never mind! the solution is in the other gitter chat!
Bin Wang
@biwa7636

Hi team, I am new to Cpython but really wants to play with the internals of sklearn. I want to test out some of the cdef classes in the pyx file but looks like the methods are inaccessible within Python. Any thought?

For example:

from sklearn.tree import _utils
ph = _utils.PriorityHeap(100)
dir(ph)

And I cannot find call methods like pop, push.
Usually how does the workflow look like if I want to play with the internals of sklearn within Jupyter notebook.

FritzPeleke
@FritzPeleke
hello everyone. I'm really new to Machine learning in general and i have been working with some sklearn Regressors. I need some help :). My question is how do i know if the RMSE i have is minimum enough for good predictions. To what do i compare this RMSE to?
enoch-sun
@enoch-sun
I was able to create a model by curve fitting a set of data that has 5 variables using GaussianProcessRegressor. The problem is I am unable to export/load this model into an older version of python (version 2.5.2). Is there a way to dump the equation/formula into mathematical terms in relations to these 5 variables so that I can use this prediction on the older python? Thanks
@enoch-sun We don't really support those Python versions anymore. You can try and figure it out with some other persisting models such as ONNX or PMML, but you'll be mostly on your own
Thomas J Fan
@thomasjpfan
@biwa7636 The PriorityHeap functions pop and push are cdef, which means they are not available in python.
Jesse Leigh Patsolic
@MrAE
Is there a scikit-learn preferred way to store a vector using Cython? I've seen libcpp.vector, array.array and numpy used in the code base. @NicolasHug @amueller
Nicolas Hug
@NicolasHug
The way we do it now is to allocate numpy arrays (in python or in cython), and then use a memory view for pure cython parts. You can take a look at how we do it in e.g. ensemble/_hist_gradient_boosting
Sakitha
@Sakitha
Hi, does apply in df.apply(fun) iterate over each columns in 'df' data-frame and pass them to 'fun' function as a series?
Bin Wang
@biwa7636
@thomasjpfan, you are right, however, I also tried to execute the above code too using %%cython magic also from sklearn.tree cimport _utils but still did not work. Was it supposed to be like that?
%%cython
from sklearn.tree._utils cimport Stack
s = Stack(10)
print(s.top)
>>> AttributeError: 'sklearn.tree._utils.Stack' object has no attribute 'top'
I found the source code so well written, fascinating and really want to be able to get the development environment up and running.
Bin Wang
@biwa7636
Weird, the above code will work if I replace s = Stack(10) with cdef Stack s = Stack(10), I believe this must have something to do with static type declaration.
Jesse Leigh Patsolic
@MrAE
Does anyone know why the base estimator for ExtraTreesClassifier is ExtraTreeClassifier, instead of DecisionTreeClassifier with splitter='random'? I am working on adding a new type of tree. @NicolasHug @amueller
Nicolas Hug
@NicolasHug
No idea. It doesn't make much sense for ExtraTreeClassifier to allow for a splitter that isn't 'random' IMO.
Would you want to submit a PR to deprecate the parameter?
motmoti
@motmoti

Hi All, Im getting the following error while executing the python setup.py install
error: Command "cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -IC:\Users\Moti\Anaconda3\envs\motidevs\lib\site-packages\numpy\core\include /EHsc /Tpsklearn\svm\src\libsvm\libsvm_template.cpp /Fobuild\temp.win-amd64-3.7\sklearn\svm\src\libsvm\libsvm_template.obj" failed with exit status 127

Do you have any idea? Thanks!

kirk86
@kirk86
Any scikit devs who can shed some light on why calibration_curve` is only for binary estimators?
Anjali Singh
@Anj-ali
how can i start committing to the open source