Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 06:42
    Travis WaiChungK/scikit-learn (master) errored (434)
  • 06:29
    yash64 commented #12494
  • 06:28
    yash64 commented #12494
  • 03:41

    qinhanmin2014 on master

    MNT Make modules private in man… (compare)

  • 03:41
    qinhanmin2014 closed #15318
  • 03:40
    qinhanmin2014 synchronize #15318
  • 03:39

    qinhanmin2014 on master

    MNT Make modules private in sem… (compare)

  • 03:39
    qinhanmin2014 closed #15317
  • 03:38
    qinhanmin2014 synchronize #15317
  • 03:37

    qinhanmin2014 on master

    MNT Make modules private in pre… (compare)

  • 03:37
    qinhanmin2014 closed #15323
  • 03:20
    qinhanmin2014 commented #10856
  • 03:13

    qinhanmin2014 on master

    MNT Remove redundant import in … (compare)

  • 03:13
    qinhanmin2014 closed #15327
  • 03:12
    qinhanmin2014 synchronize #15316
  • 03:01
    thomasjpfan commented #15322
  • 03:01
    thomasjpfan synchronize #15322
  • 02:59
    zacbrannelly opened #15328
  • 02:54
    thomasjpfan synchronize #15324
  • 01:29
    thomasjpfan commented #15005
Matthew Bowling
@Ryuhphino_twitter
Have a question maybe someone can answer. Trying to use a simple model on a set of data. About a couple thousand rows and only a dozen features, most are binary. I'm training on Logistic Regression, and found my model overfits. So when I try to tune my hyperparameters, my accuracy remains entirely unchanged. Has anyone seen this before or know why this is happening?
Guillaume Lemaitre
@glemaitre
Do you have imbalanced classes?
Samesh Lakhotia
@sameshl
I want to rebuild the 'scikit-learn' project. I tried running pip install --editable . as stated in the docs https://scikit-learn.org/stable/developers/advanced_installation.html#building-from-source but I am getting this error. Can someone help me out.
ERROR: Cannot uninstall 'scikit-learn'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Roman Yurchak
@rth
@sameshl See https://github.com/pypa/pip/issues/5247#issuecomment-381550610 probably best to reinstall in a new virtual environment.
Andreas Mueller
@amueller
@thomasjpfan you like puzzles, right? scikit-learn/scikit-learn#14704
PLT
@Pakigya

I can't seem to get to make the virtual environment with sphinxgallery

conda create -n sklearndev numpy scipy matplotlib pytest sphinx cython ipykernel sphinxgallery

or

conda create -n sklearndev numpy scipy matplotlib pytest sphinx cython ipykernel sphinx-gallery

never mind! the solution is in the other gitter chat!
Bin Wang
@biwa7636

Hi team, I am new to Cpython but really wants to play with the internals of sklearn. I want to test out some of the cdef classes in the pyx file but looks like the methods are inaccessible within Python. Any thought?

For example:

from sklearn.tree import _utils
ph = _utils.PriorityHeap(100)
dir(ph)

And I cannot find call methods like pop, push.
Usually how does the workflow look like if I want to play with the internals of sklearn within Jupyter notebook.

FritzPeleke
@FritzPeleke
hello everyone. I'm really new to Machine learning in general and i have been working with some sklearn Regressors. I need some help :). My question is how do i know if the RMSE i have is minimum enough for good predictions. To what do i compare this RMSE to?
enoch-sun
@enoch-sun
I was able to create a model by curve fitting a set of data that has 5 variables using GaussianProcessRegressor. The problem is I am unable to export/load this model into an older version of python (version 2.5.2). Is there a way to dump the equation/formula into mathematical terms in relations to these 5 variables so that I can use this prediction on the older python? Thanks
Adrin Jalali
@adrinjalali
@enoch-sun We don't really support those Python versions anymore. You can try and figure it out with some other persisting models such as ONNX or PMML, but you'll be mostly on your own
Thomas J Fan
@thomasjpfan
@biwa7636 The PriorityHeap functions pop and push are cdef, which means they are not available in python.
Jesse Leigh Patsolic
@MrAE
Is there a scikit-learn preferred way to store a vector using Cython? I've seen libcpp.vector, array.array and numpy used in the code base. @NicolasHug @amueller
Nicolas Hug
@NicolasHug
The way we do it now is to allocate numpy arrays (in python or in cython), and then use a memory view for pure cython parts. You can take a look at how we do it in e.g. ensemble/_hist_gradient_boosting
Sakitha
@Sakitha
Hi, does apply in df.apply(fun) iterate over each columns in 'df' data-frame and pass them to 'fun' function as a series?
Bin Wang
@biwa7636
@thomasjpfan, you are right, however, I also tried to execute the above code too using %%cython magic also from sklearn.tree cimport _utils but still did not work. Was it supposed to be like that?
%%cython
# requires numpy headers
from sklearn.tree._utils cimport Stack
s = Stack(10)
print(s.top)
>>> AttributeError: 'sklearn.tree._utils.Stack' object has no attribute 'top'
I found the source code so well written, fascinating and really want to be able to get the development environment up and running.
Bin Wang
@biwa7636
Weird, the above code will work if I replace s = Stack(10) with cdef Stack s = Stack(10), I believe this must have something to do with static type declaration.
Jesse Leigh Patsolic
@MrAE
Does anyone know why the base estimator for ExtraTreesClassifier is ExtraTreeClassifier, instead of DecisionTreeClassifier with splitter='random'? I am working on adding a new type of tree. @NicolasHug @amueller
Nicolas Hug
@NicolasHug
No idea. It doesn't make much sense for ExtraTreeClassifier to allow for a splitter that isn't 'random' IMO.
Would you want to submit a PR to deprecate the parameter?
motmoti
@motmoti

Hi All, I`m getting the following error while executing the python setup.py install
error: Command "cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -IC:\Users\Moti\Anaconda3\envs\motidevs\lib\site-packages\numpy\core\include /EHsc /Tpsklearn\svm\src\libsvm\libsvm_template.cpp /Fobuild\temp.win-amd64-3.7\sklearn\svm\src\libsvm\libsvm_template.obj" failed with exit status 127

Do you have any idea? Thanks!

kirk86
@kirk86
Any scikit devs who can shed some light on why calibration_curve is only for binary estimators?
Anjali Singh
@Anj-ali
how can i start committing to the open source
Adrin Jalali
@adrinjalali
@Anj-ali you can start by going through our contributing guides: https://scikit-learn.org/dev/developers/contributing.html#contributing
Anjali Singh
@Anj-ali
thank you Sir, surely i will do that
Olivier Grisel
@ogrisel
Heads up: if you use conda and upgrade your env, you might get a crash when using n_jobs>=2. This is caused by an updated version of intel-openmp in the default channel of conda. I reported the issue upstream as ContinuumIO/anaconda-issues#11294 and the problem is tracked in this PR on the scikit-learn side: scikit-learn/scikit-learn#15020
The error message is OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361) reported by the dying worker process.
Which in turns causes loky to raise: TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)}.
Samesh Lakhotia
@sameshl
If someone is free to review, please take a look at scikit-learn/scikit-learn#14993 and scikit-learn/scikit-learn#15045.
Andreas Mueller
@amueller
hm is there a pandas gitter? Or is @jorisvandenbossche around lol? For a pandas dtype, how do I get the closest numpy dtype to cast to?
Joris Van den Bossche
@jorisvandenbossche
yep
there is pandas gitter actually (pydata/pandas)
I don't think there is a typical way to do it
If I remember correctly, there is an issue about it
Basically, you would like to know the dtype of np.asarray(obj).dtype right? (but without needing to do the actual conversion?)
Andreas Mueller
@amueller
indeed
it's for scikit-learn/scikit-learn#15094 which is currently failing because np.result_type(pd.CategoricalDType) raises an error
Joris Van den Bossche
@jorisvandenbossche
the issue that I rememered is pandas-dev/pandas#22791
Andreas Mueller
@amueller
ok. so no solution :-/ is there a work-around?
like what does actually happen when you do the conversion?
is it from the pd.DataFrame.__array__ method or something?
Andreas Mueller
@amueller
yeah it is, no way to figure that one out :-/
Jesse Leigh Patsolic
@MrAE

Hello all (I am new to Cython),

I am currently working on adding an augmented version of Brieman's forest-RC (similar to RandomForest) algorithm into my fork of scikit-learn: In short, the algorithm takes linear combinations of features and projects them with weights randomly selected in {-1,1} to form a new feature to split on. The number of features combined at each split is a random variable.

The current SplitRecord only holds one feature, I need something to store a vector of features and a vector to hold weights.

  1. I tried initializing an np.ndarray and using memoryviews, but ran into GIL issues.
  2. I tried to make an ObliqueSplitRecord class, but that can't be passed as a pointer into functions because it is a Python object.
  3. I tried to augment the SplitRecord struct in _splitter.pxd but that didn't seem to work because vectors would then be of fixed length.
  4. I tried to use something similar to the tree/_utils:Stack but fell into the same problem as it was a class and couldn't be passed as a pointer into a function.

I am looking into using cppclass, but am not sure if that will fix solve my problem.

Does anyone have suggestions on how to best implement this in a Cythonic way? i.e. storing a vector of things while avoiding the GIL and not using python objects?

Adrin Jalali
@adrinjalali
@MrAE you can use a cpp vector in cython. But since you're changing the splitrecord struct, you'll need to change the code in quite a lot of places.
Mateusz Sokół
@mtsokol
Hi, I have some basic question about local docs build for scikit. I've been trying to modify docs inside API for some file in sklearn/linear_model and followed instructions in Contributors Guide. But after few attempts the make command inside /docs does not seem to modify local docs build inside _build. In the browser, API docs didn't change although I modified the sources. Am I missing something?
Nicolas Hug
@NicolasHug
@mtsokol it seems that you're doing it right... maybe double check that 1. you're actually changing the sources, i.e. not anything in the _build folder, 2. the doc that you're changing is about a public estimators/tools (private tools aren't rendered in the doc anyway) and 3. that you're looking at the generated html in doc/_build/html/stable/
Nicolas Hug
@NicolasHug

@MrAE

re 1. you can't use (let alone allocate) numpy arrays when the GIL is released because these are Python objects. Is there a way for you to allocate the arrays somewhere where the GIL is held, and use memory views when the GIL is released? Memory views are safe to use without the GIL

re 2. is it still considered a Python object if you use a cdefed class and all the attributes are cdefed as well?

re 3. what vectors? can't you use a view as a field of the struct?

Nicolas Hug
@NicolasHug
Also @MrAE I happen to have been writing about Cython over the weekend... maybe that could help http://nicolas-hug.com/blog/cython_notes
Jan-Benedikt Jagusch
@janjagusch
could somebody share a good example for class docstrings in scikit-learn that we could use as a sort of template? thanks!