Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 01:53
    NicolasHug commented #15759
  • 01:26
    olicairns commented #16189
  • 01:02
    olicairns synchronize #16189
  • 00:54
    olicairns synchronize #16189
  • Jan 23 23:56
    bfgray3 synchronize #16181
  • Jan 23 23:33
    waelbenamara synchronize #16187
  • Jan 23 23:28
    waelbenamara synchronize #16187
  • Jan 23 23:23
    waelbenamara synchronize #16187
  • Jan 23 23:06
    waelbenamara synchronize #16187
  • Jan 23 23:04
    kitaev-chen commented #2048
  • Jan 23 23:03
    kitaev-chen commented #2048
  • Jan 23 23:02
    kitaev-chen commented #2048
  • Jan 23 23:00
    kitaev-chen commented #2048
  • Jan 23 22:15
    rth commented #15759
  • Jan 23 22:15
    rth commented #15759
  • Jan 23 22:14
    olicairns synchronize #16189
  • Jan 23 22:09
    olicairns synchronize #16189
  • Jan 23 22:09
    olicairns synchronize #16189
  • Jan 23 22:07
    olicairns edited #16189
  • Jan 23 22:05
    olicairns review_requested #16189
Guillaume Lemaitre
@glemaitre
can you check the version of scikit-learn
are you using 0.22.0 because we corrected the bug in 0.22.1
Keith
@DrEhrfurchtgebietend
yup hold on
0.22
0.22.1 not ready in conda yet
This issue only happens if X_train has a float in it already
I have a minimum example.
import pandas as pd
import sklearn
import numpy as np

raw_data = {'Binary 1': [True, True, False, False, True], 
    'Binary 2': [False, False, True, True, False], 
    'age': [42, 52, 36, 24, 73], 
    'preTestScore': [4.4, 24.1, 31.3, 2.2, 3.1],
    'postTestScore': [25.7, 94.5, 57.0, 62.2, 70.9]}
df = pd.DataFrame(raw_data, columns = ['Binary 1', 'Binary 2', 'age', 'preTestScore', 'postTestScore'])


X_train = df[['Binary 1', 'Binary 2', 'age', 'preTestScore']]

y_train = df['postTestScore']

print(X_train.dtypes)

X, y = sklearn.utils.check_X_y(X_train, y_train, dtype=[np.float64], force_all_finite=False)

print(X.dtype)

X, y = sklearn.utils.check_X_y(X_train.values, y_train.values, dtype=[np.float64], force_all_finite=False)

print(X.dtype)


X_train = df[['Binary 1', 'Binary 2', 'age']]

y_train = df['postTestScore']

X, y = sklearn.utils.check_X_y(X_train, y_train, dtype=[np.float64], force_all_finite=False)

print(X.dtype)

X, y = sklearn.utils.check_X_y(X_train.values, y_train.values, dtype=[np.float64], force_all_finite=False)

print(X.dtype)
Sorry the markdown is not working as I would expect. Does this work for you?
Guillaume Lemaitre
@glemaitre
Jump a line after the 3 quotes
you can install from conda-forge
we upload the packages yesterday
or via PyPI
yes it was the bug
Keith
@DrEhrfurchtgebietend
so in 0.22.1 the last 4 print statements all give float64?
Guillaume Lemaitre
@glemaitre
let me try but it should
Keith
@DrEhrfurchtgebietend
OK sure. Thanks so much for the real time tech support. This is some high quality service
Guillaume Lemaitre
@glemaitre
Binary 1           bool
Binary 2           bool
age               int64
preTestScore    float64
dtype: object
float64
float64
float64
float64
Keith
@DrEhrfurchtgebietend
Awesome. I will update.
Keith
@DrEhrfurchtgebietend

I cannot update

conda install scikit-learn=0.22.1

does not work

Guillaume Lemaitre
@glemaitre
conda install scikit-learn -c conda-forge
the package are only upload to conda-forge
conda is managing directly the default channel and it can take a bit more time
Sandeep Aswathnarayana
@SandeepAswathnarayana

Curious To Learn & Contribute To Scikit-learn At The 'Paris Scikit-learn Sprint Of The Decade' | Jan 28 - 31, 2020

It was quite insightful listening to Reshama Shaikh's recent podcast: https://www.listennotes.com/podcasts/the-banana-data/bdn-15-finding-community-in-uK-yL2tf_S4/

It was quite helpful to broaden my horizon and perspective on open-source when I learned the challenges the organization faces in finding the sponsors and fundraisers for scikit-learn sprint events.

As an avid user of scikit-learn for my research projects in the recent past, I’m excited about the potential of contributing and working alongside other attendees at the Paris scikit-learn sprint. Reshama's comments about funding & accessibility have made me even more eager to join the team.

Would you all mind letting me know if I could connect with the other participants remotely from Bengaluru, India?

Best,
Sandeep Aswathnarayana

Chiara Marmo
@cmarmo
@SandeepAswathnarayana, sprints are meant to allow people to meet in person, remote participation is not planned. There will be other sprints I'm sure you will be able to attend. In the meanwhile, thanks for your enthusiasm... if you check the contributors guidelines (https://scikit-learn.org/stable/developers/contributing.html) you could probably start helping already.
Sandeep Aswathnarayana
@SandeepAswathnarayana
@cmarmo, Thanks for reverting to my query. I was aware of the already existing ways to contribute. I was only curious to see if I could be a part of the scikit-learn sprint which allows me to do 'Pair Programming' with individuals from diverse backgrounds attending the event.
Sandeep Aswathnarayana
@SandeepAswathnarayana
@cmarmo, Any leads or inputs on future possibilities for remote participation are greatly appreciated. Thank you!
Chiara Marmo
@cmarmo

@SandeepAswathnarayana

Any leads or inputs on future possibilities for remote participation are greatly appreciated. Thank you!

noted: indeed, there is always room for improvemnts.

Saad Hameed
@ScottHameed_twitter
hey folks. I have started my Data Science journey. In the process of completing the DataQuest online Data Science bootcamp . Is SciKit Learn & specifically Auto Sklearn a good set of tools to learn to help accelerate my journey and on the way to becoming an expert?
Saad Hameed
@ScottHameed_twitter
@ScottHameed_twitter I know it's not a dev/Git related question, but appreciate the help
AakashSingh
@405Found_gitlab
@ScottHameed_twitter It's a necessary library used in machine learning. Learn it
Guillaume Chevalier
@guillaume-chevalier

Hey, I opened the PR to add Neuraxle to the Related Projects page:
scikit-learn/scikit-learn#16100

I've put it under the category for Auto-ML as it seems better suited here. I'm still developing the serialization plugin/extra libraries for TensorFlow and PyTorch as of right now, so those plugins could go into the Model export for production category later on (to allow saving / reloading / then continue training / partial_fit whenever after).

Guillaume Chevalier
@guillaume-chevalier
I also corrected the thing about "the creator of scikit-learn" in my article :) srry again for the mistake haha
Siddharth Gupta
@sid21g
I tried building scikit from source, its giving me import error for conftest.py
pytest sklearn/metrics/_classification.py 
ImportError while loading conftest '/media/sid21g/Dev/github-dev/scikit-learn/conftest.py'.
conftest.py:15: in <module>
    from sklearn import set_config
sklearn/__init__.py:81: in <module>
    from . import __check_build  # noqa: F401
sklearn/__check_build/__init__.py:46: in <module>
    raise_build_error(e)
sklearn/__check_build/__init__.py:41: in raise_build_error
    %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
E   ImportError: No module named 'sklearn.__check_build._check_build'
E   ___________________________________________________________________________
E   Contents of /media/sid21g/Dev/github-dev/scikit-learn/sklearn/__check_build:
E   setup.py                  _check_build.c            _check_build.pyx
E   __init__.py               __pycache__
E   ___________________________________________________________________________
E   It seems that scikit-learn has not been built correctly.
E
E   If you have installed scikit-learn from source, please do not forget
E   to build the package before using it: run `python setup.py install` or
E   `make` in the source directory.
E
E   If you have used an installer, please check that it is suited for your
E   Python version, your operating system and your platform.
Nicolas Hug
@NicolasHug
@sid21g try maybe make clean and start over following the build guidelines
quant12345
@quant12345
In order not to be verbose, I place a link to the question stackexchange
no one answered me There, but there is a desire to understand.
I apologize in advance for my poor English).
Giuseppe Broccolo
@gbroccolo

@quant12345 as already replied to your post, have a read to this: https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-py

In RF, feature importance is affected by features able to overfit the model.

quant12345
@quant12345
@gbroccolo Thanks!
Jacob Carey
@jacobcvt12
Why does GradientBoostingClassifier use DecisionTreeRegressor instead of DecisionTreeClassifier?
Siddharth Gupta
@sid21g
@NicolasHug Worked!
Nicolas Hug
@NicolasHug
@jacobcvt12 because gradient boosting tries to predict gradients which are always continuous targets, even in the case of classification. With a log loss (as used in sklearn) these gradients are homogeneous to a log-odds ratio, and are then passed through a sigmoid function to become a probability between [0, 1]
miha-
@miha-
Hello, I would be very grateful for help or hint. What I would like to do is to somehow find pattern in text. Lets say you have forum and people or posting on it. I would like to find pattern which would indicate me what are they talking about the most. Thank you for hint.
Keith
@DrEhrfurchtgebietend
Try TF-IDF
DiamondKesha
@DiamondKesha
Hello. How can I make the program better predict? When you enter the numbers 771, 322, 344, 632, 10, the program predicts 234168, but I need it to be 200000-210000. In linear regression, more than 1000 examples are already embedded.
Jacob Carey
@jacobcvt12
Thanks @NicolasHug . I'm trying to figure out why sklearn's GradientBoostingClassifier gives different estimates from R's GBM. I had thought it might be the criterion for splitting, but maybe not. Any suggestions?
Nicolas Hug
@NicolasHug
@jacobcvt12 I'm not familiar with R's gbm. The splitting criterion will definitely be a major factor. I'd suggest checking the parameters of each implementation and try to find equivalent settings.
In a vanilla implementation of gradient boosting (ignoring the sub-estimator which is a tree in our case), the only parameters are the learning rate / shrinkage, the loss, and the number of iterations.
lucasmarinsnave
@lucasmarinsnave
Hi
Andreas Mueller
@amueller
@jacobcvt12 gbm supports categorical variables, I think