Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:42
    adam2392 commented #24566
  • 13:40
    adam2392 synchronize #22754
  • 13:39
    adam2392 synchronize #24566
  • 13:39
    adam2392 edited #24566
  • 13:38
    adam2392 edited #24566
  • 13:34
    adam2392 commented #24566
  • 13:34
    adam2392 commented #24566
  • 13:05
    adrinjalali closed #24573
  • 13:05
    adrinjalali commented #24573
  • 12:57
    PierrickPochelu labeled #24574
  • 12:57
    PierrickPochelu labeled #24574
  • 12:57
    PierrickPochelu opened #24574
  • 12:49
    fsamaraie labeled #24573
  • 12:49
    fsamaraie opened #24573
  • 12:49
    fsamaraie labeled #24573
  • 11:54
    betatim commented #24539
  • 11:48
    betatim synchronize #24539
  • 11:11
    OmarManzoor synchronize #24443
  • 10:51
    adrinjalali commented #24443
  • 10:51
    adrinjalali commented #24443
Christopher Chavez
@chrstphrchvz
What version of OpenMP does scikit-learn require? Is 2.5 sufficient?
Olivier Grisel
@ogrisel
Probably, we use OpenMP via the prange construct of Cython.
Dhwani Shah
@dhwanishah
Hey guys, so I'm new to scikit, so please bear with me. I have a pandas dataframe that looks like this: [email, businessId, manager, app1, app2, app3, ... , app170] So essentially one row defines one user that has either a 1 or NaN on each of the appX columns specifying if they have that app or not.
What I want is a classifier that given email, businessId, and manager ....would return a list of apps should have
I've got the data in that format as i specified, what are the models do you guys think would b good for creating this type of classifier? And how would i go about this in general?
Ghost
@ghost~5bc98094d73408ce4fabf741
How I get the individual components from classification_report?
but the output_dict=True doesn't seem to work, I am receiving an error stating this parameter does not exist on the classification_report function, I also don't trust precision_recall_fscore_support, plus it misses accuracy
Thomas J. Fan
@thomasjpfan
@piotr-mamenas Please check the version of sklearn you are using. I believe output_dict was added in 0.20.
Ghost
@ghost~5bc98094d73408ce4fabf741
@thomasjpfan yup, I figured it out yesterday and after some fight with tensorflow dependencies I got it running
freepancakes
@OudarjyaS_twitter
hi everyone
Dillon Niederhut
@deniederhut
Hello from the SciPy sprints!
Meghann Agarwal
@mepa
Hi All, also from the SciPy sprints :)
Andreas Mueller
@amueller
Welcome everybody :)
Andreas Mueller
@amueller
@thomasjpfan wanna look at scikit-learn/scikit-learn#14326 ?
Andreas Mueller
@amueller
anyone wanna look at scikit-learn/scikit-learn#14320 ?
Vishesh Mangla
@Teut2711
how can i use yolo to detect numbers in sudoku puzzle?
I want to read those numbers but Canny, Hough, contour aren't working any good
Venkatachalam N
@venkyyuvy
Aditya Padwal
@adityap31
Hello People,
Do we have any package like NLTK to support your languages other than English
Krishna Sangeeth
@whiletruelearn
@venkyyuvy this looks
Like a good feature to have in LabelEncoder. Would this make sense as a feature @amueller
Jainil Patel
@jainilpatel
Hi
Nicolas Hug
@NicolasHug
@adityap31 we choose to only officially support English in our documentation, to avoid having to maintain different versions
Aditya Padwal
@adityap31
Thanks @NicolasHug
Emoruwa
@Emoruwa
Please the best c# tutorial online
Give me ideas
Andreas Mueller
@amueller
@Emoruwa since you're not the first one asking this here: what gave you the idea of asking about C# in a channel about a Python library for machine learning?
Manish Aradwad
@ManishAradwad
Hi, everyone. My name is Manish and It's nice to meet you all. I used SK learn for one of my projects this summer and I really love this library. I want to start contributing to it. I'm new to open source stuff and I don't know how to get started. I checked issues under good first issue label but I'm not able to understand anything. Can anyone plz guide me with this??
Andreas Mueller
@amueller
@ManishAradwad welcome! the easiest way is probably to ask directly on the issue. Have you checked out the contributors guide?
Manish Aradwad
@ManishAradwad
Yes, I'm now going through the repo first. I'll then go for the issues. Thanks for the reply!
Andreas Mueller
@amueller
I wouldn't try going to the repo, it's a lot. I would start with the contributor docs
even understanding how we set up and run tests would probably take me a week to understand
lesshaste
@lesshaste
is there something in scikit learn for 4000 dimension regression where I know I only one or two of the coefficients to be non-zero?
lesshaste
@lesshaste
something like forward stepwise regression?
Andreas Mueller
@amueller
not yet. mlxtend has it and there's a PR
lesshaste
@lesshaste
@amueller Thanks! I will take a look at mixtend which I didn't know about
Girraj Jangid
@Girrajjangid
Can anyone please provide a good source of how to deal with categorical data? It's very helpful and thanku
Manish Aradwad
@ManishAradwad
@amueller Hi!! As you said I've gone through the contributing guides and set up the development environment. Can you plz tell me what should I do next. Thanks for the help!!
Andreas Mueller
@amueller
@ManishAradwad look at things tagged as "good first issue" and "help wanted" as outlined in the contributing guide
Kristiyan Katsarov
@katsar0v

Hello guys, maybe anyone can help me out here. I am running following validation code:

train_scores, valid_scores = validation_curve(estimator=pipeline,  # estimator (pipeline)
                                              X=features,  # features matrix
                                              y=target,  # target vector
                                             param_name='pca__n_components',
                                             param_range=range(1,50),  # test these k-values
                                             cv=5,  # 5-fold cross-validation
                                             scoring='neg_mean_absolute_error')  # use negative validation

in the same .py file on different machines, which I would name #1 localhost, #2 staging, #3 live, #4 live

localhost and staging have both i7 cpus, localhost needs around 40s for the validation, staging needs around 13-14 seconds

live (#3) and live (#4) need almost 10 minutes for executing the validation - both of these servers have intel cpus with 48 threads.

In order to get more "trustworthy" numbers I dockerized the images and run them on the servers. Anyone has an idea why the speed is so different?

Andreas Mueller
@amueller
how many cores do you have in localhost and staging?
could be that you're overallocating processes in the estimator and parallelization actually hurts you
Kristiyan Katsarov
@katsar0v
@amueller localhost and staging are both with i7 (4 cores and 8 threads)
Andreas Mueller
@amueller
what's pipeline?
so the number of cores is the likely difference, right?
Kristiyan Katsarov
@katsar0v
yeah, live 3 and live 4 have 48 threads, 24 cores. Pipeline:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_transformer = PolynomialFeatures(degree=2, include_bias=False)
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('poly', poly_transformer), ('reg', model)])