Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 03:25
    scikit-learn-bot edited #23626
  • 03:00
    fractionalhare commented #23876
  • 02:53
    scikit-learn-bot edited #24131
  • 02:15
    coolmian commented #24151
  • 01:27
    lucyleeow synchronize #24095
  • 01:24
    lucyleeow commented #24159
  • 01:24
    lucyleeow synchronize #24159
  • 01:19
    MaxwellLZH synchronize #20415
  • 01:19
    MaxwellLZH synchronize #20415
  • Aug 10 23:42
    AlexandreAbraham commented #22854
  • Aug 10 23:38
    AlexandreAbraham commented #22854
  • Aug 10 23:34
    Micky774 synchronize #24077
  • Aug 10 23:33
    Micky774 synchronize #24077
  • Aug 10 23:24
    Micky774 synchronize #24077
  • Aug 10 23:12
    Micky774 synchronize #24077
  • Aug 10 23:05
    cmarmo labeled #23442
  • Aug 10 23:03
    cmarmo labeled #23275
  • Aug 10 22:10
    AlexandreAbraham commented #22854
  • Aug 10 20:43
    lorentzenchr synchronize #23824
  • Aug 10 20:40
    lorentzenchr synchronize #23824
but the output_dict=True doesn't seem to work, I am receiving an error stating this parameter does not exist on the classification_report function, I also don't trust precision_recall_fscore_support, plus it misses accuracy
Thomas J. Fan
@thomasjpfan
@piotr-mamenas Please check the version of sklearn you are using. I believe output_dict was added in 0.20.
Ghost
@ghost~5bc98094d73408ce4fabf741
@thomasjpfan yup, I figured it out yesterday and after some fight with tensorflow dependencies I got it running
freepancakes
@OudarjyaS_twitter
hi everyone
Dillon Niederhut
@deniederhut
Hello from the SciPy sprints!
Meghann Agarwal
@mepa
Hi All, also from the SciPy sprints :)
Andreas Mueller
@amueller
Welcome everybody :)
Andreas Mueller
@amueller
@thomasjpfan wanna look at scikit-learn/scikit-learn#14326 ?
Andreas Mueller
@amueller
anyone wanna look at scikit-learn/scikit-learn#14320 ?
Vishesh Mangla
@Teut2711
how can i use yolo to detect numbers in sudoku puzzle?
I want to read those numbers but Canny, Hough, contour aren't working any good
Venkatachalam N
@venkyyuvy
Aditya Padwal
@adityap31
Hello People,
Do we have any package like NLTK to support your languages other than English
Krishna Sangeeth
@whiletruelearn
@venkyyuvy this looks
Like a good feature to have in LabelEncoder. Would this make sense as a feature @amueller
Jainil Patel
@jainilpatel
Hi
Nicolas Hug
@NicolasHug
@adityap31 we choose to only officially support English in our documentation, to avoid having to maintain different versions
Aditya Padwal
@adityap31
Thanks @NicolasHug
Emoruwa
@Emoruwa
Please the best c# tutorial online
Give me ideas
Andreas Mueller
@amueller
@Emoruwa since you're not the first one asking this here: what gave you the idea of asking about C# in a channel about a Python library for machine learning?
Manish Aradwad
@ManishAradwad
Hi, everyone. My name is Manish and It's nice to meet you all. I used SK learn for one of my projects this summer and I really love this library. I want to start contributing to it. I'm new to open source stuff and I don't know how to get started. I checked issues under good first issue label but I'm not able to understand anything. Can anyone plz guide me with this??
Andreas Mueller
@amueller
@ManishAradwad welcome! the easiest way is probably to ask directly on the issue. Have you checked out the contributors guide?
Manish Aradwad
@ManishAradwad
Yes, I'm now going through the repo first. I'll then go for the issues. Thanks for the reply!
Andreas Mueller
@amueller
I wouldn't try going to the repo, it's a lot. I would start with the contributor docs
even understanding how we set up and run tests would probably take me a week to understand
lesshaste
@lesshaste
is there something in scikit learn for 4000 dimension regression where I know I only one or two of the coefficients to be non-zero?
lesshaste
@lesshaste
something like forward stepwise regression?
Andreas Mueller
@amueller
not yet. mlxtend has it and there's a PR
lesshaste
@lesshaste
@amueller Thanks! I will take a look at mixtend which I didn't know about
Girraj Jangid
@Girrajjangid
Can anyone please provide a good source of how to deal with categorical data? It's very helpful and thanku
Manish Aradwad
@ManishAradwad
@amueller Hi!! As you said I've gone through the contributing guides and set up the development environment. Can you plz tell me what should I do next. Thanks for the help!!
Andreas Mueller
@amueller
@ManishAradwad look at things tagged as "good first issue" and "help wanted" as outlined in the contributing guide
Kristiyan Katsarov
@katsar0v

Hello guys, maybe anyone can help me out here. I am running following validation code:

train_scores, valid_scores = validation_curve(estimator=pipeline,  # estimator (pipeline)
                                              X=features,  # features matrix
                                              y=target,  # target vector
                                             param_name='pca__n_components',
                                             param_range=range(1,50),  # test these k-values
                                             cv=5,  # 5-fold cross-validation
                                             scoring='neg_mean_absolute_error')  # use negative validation

in the same .py file on different machines, which I would name #1 localhost, #2 staging, #3 live, #4 live

localhost and staging have both i7 cpus, localhost needs around 40s for the validation, staging needs around 13-14 seconds

live (#3) and live (#4) need almost 10 minutes for executing the validation - both of these servers have intel cpus with 48 threads.

In order to get more "trustworthy" numbers I dockerized the images and run them on the servers. Anyone has an idea why the speed is so different?

Andreas Mueller
@amueller
how many cores do you have in localhost and staging?
could be that you're overallocating processes in the estimator and parallelization actually hurts you
Kristiyan Katsarov
@katsar0v
@amueller localhost and staging are both with i7 (4 cores and 8 threads)
Andreas Mueller
@amueller
what's pipeline?
so the number of cores is the likely difference, right?
Kristiyan Katsarov
@katsar0v
yeah, live 3 and live 4 have 48 threads, 24 cores. Pipeline:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_transformer = PolynomialFeatures(degree=2, include_bias=False)
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('poly', poly_transformer), ('reg', model)])
Kristiyan Katsarov
@katsar0v

After profiling, I saw this (slowest time on bottom, sorted by 3rd column):

     4150  208.706    0.050  208.706    0.050 {built-in method numpy.dot}
      245   13.112    0.054   13.360    0.055 decomp_svd.py:16(svd)
     2170  142.567    0.066  143.360    0.066 decomp_lu.py:153(lu)

Just executed python -m cProfiler validation.py

Andreas Mueller
@amueller
can you try to benchmark just calling svd directly without any sklearn around it?
if that's a pure scipy issues that would be good to isolate
Kristiyan Katsarov
@katsar0v
how can I isolate it, make a separate .py and run cProfiler on it?
Andreas Mueller
@amueller
make a py file that calls scipy.linalg.svd without using sklearn
Andreas Mueller
@amueller
lol I am killing the sorting in the pull requests in the issue tracker with adding tags. sorry lol