Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 30 22:26
    pabloec20 commented #18891
  • Nov 30 21:56
    mehrdad-dev edited #18946
  • Nov 30 21:56
    mehrdad-dev edited #18946
  • Nov 30 21:53
    mehrdad-dev labeled #18946
  • Nov 30 21:53
    mehrdad-dev opened #18946
  • Nov 30 21:37
    cmarmo synchronize #18930
  • Nov 30 21:34
    henrymartin1 edited #18933
  • Nov 30 20:46
    amueller commented #18741
  • Nov 30 20:45
    amueller commented #18742
  • Nov 30 20:29
    amueller commented #16670
  • Nov 30 20:27
    amueller commented #16670
  • Nov 30 19:53
    ogrisel commented #18795
  • Nov 30 19:44
    ogrisel commented #18581
  • Nov 30 19:44

    ogrisel on master

    ENH Sets assume_finite in _non… (compare)

  • Nov 30 19:44
    ogrisel closed #18581
  • Nov 30 19:34
    glemaitre synchronize #18797
  • Nov 30 19:33
    github-actions[bot] labeled #18945
  • Nov 30 19:33
    cmarmo opened #18945
  • Nov 30 19:33
    NicolasHug commented #18795
  • Nov 30 19:33
    glemaitre commented #18795
veerlosar
@veerlosar

Hey guys who is veerlosar on Githib? just want to talk about OneVsRestClassifier example

@zioalex what did you want to talk about?

rajnish1642
@rajnish1642
how to learn complete sk learn ? please give the resources?
Jérémie du Boisberranger
@jeremiedbb
Andreas Muller's book, Introduction to Machine Learning with Python: A Guide for Data Scientists, is quite complete.
You can also look at the user guides: https://scikit-learn.org/stable/user_guide.html
Andreas Mueller
@amueller
there's also my lecture series: https://youtube.com/AndreasMueller The only complete resource is the user guide though
Ghost
@ghost~5a09ec4ed73408ce4f7e6c27
Hello there!
Jesse Leigh Patsolic
@MrAE

Hey guys, me again: Regarding me previous message :point_up: October 4, 2019 5:28 PM I've gone through some more attempts that don't quite work.

@NicolasHug The blog post helped a bit with my understanding of memory-views, however I still have a few questions: Can a memory-view be initialized with nogil? And no, a struct member cannot be a memory view.

I tried to make my own class but then got yelled at because it's not of type Splitter, so that was a bust.

I augmented the SplitRecord with 2 cpp vectors, but that caused things to go wonky requiring cpp in files that I'm not willing to touch.

I ended up augmenting SplitRecord with 2 Cython vectors with hard-coded length, but then can't seem to initialize a memory-view into them inside of the node_split. I'm pretty much stuck (in my current view of things), because I'm trying to do as little modification as possible, but it seems that in order to accomplish my task I'll have to re-write a big chunk of ensemble methods. I'd have to add an input argument to the node_split method? That doesn't sound like a good idea.

Any ideas? Much appreciated.

Peter Hadlaw
@peterhadlaw

Hi all, I'm trying to help my team reduce creating new code when leveraging existing libraries might get the job done. Does anyone have thoughts on how the following can be accomplished? https://stackoverflow.com/q/58533004/1566074

Basically finding the optimal subgroups for a dataset to then feed into an estimator to reduce noise.

Guillaume Chevalier
@guillaume-chevalier
Hello the scikit-learn community! I'd like to have your thoughts on what I coded. It's a way to do automatic machine learning on scikit-learn pipelines. It allows for handling hyperparameter spaces as well as hyperparameters. Example: https://www.neuraxio.com/en/neuraxle/stable/examples/hyperparams.html#sphx-glr-examples-hyperparams-py
Adrin Jalali
@adrinjalali
any takers on scikit-learn-contrib/imbalanced-learn#616 it's a good first issue.
Guillaume Lemaitre
@glemaitre
a first good issue?
a find it a bit harsh :)
Adrin Jalali
@adrinjalali
lol, I'm just a messenger, Joel tagged it as such :D
Guillaume Lemaitre
@glemaitre
Basically, I was starting to solve the issue yesterday
While making master work with master is easy (just change the import path), the challenging part is to make work out-of-date version with a newer scikit-learn.
In the latest case, we need to make some try except ImportError as you suggested I think
Adrin Jalali
@adrinjalali
Yep. If you're already at it, please leave a comment so that others don't start working on it ;)
Guillaume Lemaitre
@glemaitre
Yep I just cross-reference my PR
Guillaume Lemaitre
@glemaitre
For the people joining the MAN-AHL sprint, you can find the instructions to install scikit-learn from source at the following documentation page: https://scikit-learn.org/dev/developers/advanced_installation.html#building-from-source
Guillaume Lemaitre
@glemaitre
In addition, you can find the contributing guide as the following address: https://scikit-learn.org/dev/developers/contributing.html
Finally, if you are searching for an issue to work on, several issues have been tagged specifically for sprints: https://github.com/scikit-learn/scikit-learn/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3ASprint
You can set some other tags if you want ("good first issues", etc.). You also free to search any issue that you are interested in on the issue tracker.
Roman Yurchak
@rth
One example of a "good first issue", particularly if you have never contributed to large open-source projects before is scikit-learn/scikit-learn#15440 aiming to improve docstrings. That would allow you to see how the contribution workflow works before tackling more complex issues.
Roman Yurchak
@rth
Also it's useful to read the contribution guide at https://scikit-learn.org/dev/developers/contributing.html
Giuseppe Broccolo
@gbroccolo
Hi @rth I just tried to run test_docstrings and looks like just 13 out of 1619 tests pass. I suppose I can pick any estimator to start to improve docstrings, am I right? is there any scale of priorities?
Roman Yurchak
@rth
@gbroccolo Yes, you can pick any estimator that fails :)
Giuseppe Broccolo
@gbroccolo
thanks
Paolo
@paoloturati
Hi @rth I'd like to pick RadiusNeighborsClassifier
Roman Yurchak
@rth
Sure, please comment about it in the issue
Giuseppe Broccolo
@gbroccolo
Would like to take care of one of the PR that has been labeled as stalled and help wanted. I suppose a new one can pick this and conclude the PR also taking into account the comments of the reviewers. What's the best here? Create a new PR that refers to the already existing one?
norvan
@norvan
Trying to take a look at scikit-learn/scikit-learn#13045 -- Does this seem like a decent issue to tackle?
Andreas Mueller
@amueller
@norvan sure please comment there. Are you part of the wimlds sprint?
norvan
@norvan
I'm at the man hackathon
Andreas Mueller
@amueller
ah :)
I didn't realize there were two today lol
Guillaume Lemaitre
@glemaitre
:)
Eliseo Ortiz
@eliseo
hi there any clue for identify text from one document to another?
we are working on a prototype for fake news
qazi1002
@qazi1002
hi
anyone there to help
Giuseppe Broccolo
@gbroccolo
Hi @qazi1002 @eliseo looks like you need some NLP for this project...can you provide more info about which kind of help do you need? Are you meant to use specifically scikit-learn for this? Also, not sure that this is the proper place where to talk about this - topics should be strictly focused on scikit-learn development/bug fixing/etc.
Adrin Jalali
@adrinjalali
I don't think we have a strict policy for this channel being related to the dev only. But in the interest of the rest of the community being able to use the answers we give to your questions related to the usage, posting them on stackoverflow or other related forums may be more appropriate.
qazi1002
@qazi1002
@gbroccolo I need help regarding software development...as i am beginner so I want to get some tips for developing softwares... I want to develop software that reads the smart ID cards using card reader.
Hakim
@h4k1m0u
Hi there, is k-means clustering stochastic even when the initial centers are given? I'm noticing different results when I run my code multiple times
Andreas Mueller
@amueller
@h4k1m0u I don't think it should be
Hakim
@h4k1m0u
@amueller That's weird, I can't find out why in this short piece of code (https://bpaste.net/show/ORPVW) the centroids found by kmeans are sometimes located at the center of the 3 samples and sometimes not
Andreas Mueller
@amueller
you don't fix the random seed so the data changes
Hakim
@h4k1m0u
Oh sorry, I've completely forgotten the np.random above. Thanks a lot for reminding about that.
quant12345
@quant12345
When using "random forest" and "gradient boosting".
I add to the main signs, a sign that in the picture.
title
The data is clearly not stationary.
To make the series stationary, I apply a one-time difference to the data(increments)
After all, I normalize the data.
title
Why, if I don't use increments(one-time difference), then classes are separated better?
title
Although in all textbooks they write that non-stationary data should be decomposed into increments.
For training, I use the first 5000 characters. If you pay attention to the data, extreme values start after 5000.
That is, the model does not even see that such large values were in the training sample.
chet
@chetkhatri

I have two files that contain Event Name, Event City, Event Venue, Event State but in both files it's written in different ways or you can assume both the files are from different source.
I want to create a Machine learning-based algorithm that can do the matching.

I have tried with fuzzy-wuzzy to get string similarity.
Can anyone please tell me if I want to solve this with Deep Learning what would be the approach. Thanks @amueller

Hakim
@h4k1m0u
Hi, what does it mean when linear_model.Ridgereturns n_iter_ = None? does it mean it didn't even perform one single iteration?
Nicolas Hug
@NicolasHug

@h4k1m0u the doc says
niter : None or array of shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.

you're probably not using sag or lsqr?