Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 08:04
    Malesche commented #3846
  • 03:04
    quangngd synchronize #16762
  • 02:58
    quangngd synchronize #16762
  • 01:08
    shekharsingh8811 synchronize #16793
  • 01:08
    shekharsingh8811 synchronize #16793
  • 00:20
    shekharsingh8811 synchronize #16793
  • 00:16
    shekharsingh8811 synchronize #16793
  • 00:15
    shekharsingh8811 synchronize #16793
  • Mar 28 23:39
    MaiRajborirug commented #16606
  • Mar 28 23:37
    MaiRajborirug synchronize #16606
  • Mar 28 23:01
    shekharsingh8811 synchronize #16793
  • Mar 28 22:56
    shekharsingh8811 synchronize #16793
  • Mar 28 22:55
    shekharsingh8811 synchronize #16793
  • Mar 28 21:32
    jnothman closed #16782
  • Mar 28 21:14
    shekharsingh8811 synchronize #16793
  • Mar 28 21:13
    shekharsingh8811 synchronize #16793
  • Mar 28 21:10
    shekharsingh8811 synchronize #16793
  • Mar 28 21:03
    shekharsingh8811 synchronize #16793
  • Mar 28 21:02
    shekharsingh8811 synchronize #16793
  • Mar 28 20:54
    shekharsingh8811 synchronize #16793
Guillaume Chevalier
@guillaume-chevalier
Hello the scikit-learn community! I'd like to have your thoughts on what I coded. It's a way to do automatic machine learning on scikit-learn pipelines. It allows for handling hyperparameter spaces as well as hyperparameters. Example: https://www.neuraxio.com/en/neuraxle/stable/examples/hyperparams.html#sphx-glr-examples-hyperparams-py
Adrin Jalali
@adrinjalali
any takers on scikit-learn-contrib/imbalanced-learn#616 it's a good first issue.
Guillaume Lemaitre
@glemaitre
a first good issue?
a find it a bit harsh :)
Adrin Jalali
@adrinjalali
lol, I'm just a messenger, Joel tagged it as such :D
Guillaume Lemaitre
@glemaitre
Basically, I was starting to solve the issue yesterday
While making master work with master is easy (just change the import path), the challenging part is to make work out-of-date version with a newer scikit-learn.
In the latest case, we need to make some try except ImportError as you suggested I think
Adrin Jalali
@adrinjalali
Yep. If you're already at it, please leave a comment so that others don't start working on it ;)
Guillaume Lemaitre
@glemaitre
Yep I just cross-reference my PR
Guillaume Lemaitre
@glemaitre
For the people joining the MAN-AHL sprint, you can find the instructions to install scikit-learn from source at the following documentation page: https://scikit-learn.org/dev/developers/advanced_installation.html#building-from-source
Guillaume Lemaitre
@glemaitre
In addition, you can find the contributing guide as the following address: https://scikit-learn.org/dev/developers/contributing.html
Finally, if you are searching for an issue to work on, several issues have been tagged specifically for sprints: https://github.com/scikit-learn/scikit-learn/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3ASprint
You can set some other tags if you want ("good first issues", etc.). You also free to search any issue that you are interested in on the issue tracker.
Roman Yurchak
@rth
One example of a "good first issue", particularly if you have never contributed to large open-source projects before is scikit-learn/scikit-learn#15440 aiming to improve docstrings. That would allow you to see how the contribution workflow works before tackling more complex issues.
Roman Yurchak
@rth
Also it's useful to read the contribution guide at https://scikit-learn.org/dev/developers/contributing.html
Giuseppe Broccolo
@gbroccolo
Hi @rth I just tried to run test_docstrings and looks like just 13 out of 1619 tests pass. I suppose I can pick any estimator to start to improve docstrings, am I right? is there any scale of priorities?
Roman Yurchak
@rth
@gbroccolo Yes, you can pick any estimator that fails :)
Giuseppe Broccolo
@gbroccolo
thanks
Paolo
@paoloturati
Hi @rth I'd like to pick RadiusNeighborsClassifier
Roman Yurchak
@rth
Sure, please comment about it in the issue
Giuseppe Broccolo
@gbroccolo
Would like to take care of one of the PR that has been labeled as stalled and help wanted. I suppose a new one can pick this and conclude the PR also taking into account the comments of the reviewers. What's the best here? Create a new PR that refers to the already existing one?
norvan
@norvan
Trying to take a look at scikit-learn/scikit-learn#13045 -- Does this seem like a decent issue to tackle?
Andreas Mueller
@amueller
@norvan sure please comment there. Are you part of the wimlds sprint?
norvan
@norvan
I'm at the man hackathon
Andreas Mueller
@amueller
ah :)
I didn't realize there were two today lol
Guillaume Lemaitre
@glemaitre
:)
Eliseo Ortiz
@eliseo
hi there any clue for identify text from one document to another?
we are working on a prototype for fake news
qazi1002
@qazi1002
hi
anyone there to help
Giuseppe Broccolo
@gbroccolo
Hi @qazi1002 @eliseo looks like you need some NLP for this project...can you provide more info about which kind of help do you need? Are you meant to use specifically scikit-learn for this? Also, not sure that this is the proper place where to talk about this - topics should be strictly focused on scikit-learn development/bug fixing/etc.
Adrin Jalali
@adrinjalali
I don't think we have a strict policy for this channel being related to the dev only. But in the interest of the rest of the community being able to use the answers we give to your questions related to the usage, posting them on stackoverflow or other related forums may be more appropriate.
qazi1002
@qazi1002
@gbroccolo I need help regarding software development...as i am beginner so I want to get some tips for developing softwares... I want to develop software that reads the smart ID cards using card reader.
Hakim
@h4k1m0u
Hi there, is k-means clustering stochastic even when the initial centers are given? I'm noticing different results when I run my code multiple times
Andreas Mueller
@amueller
@h4k1m0u I don't think it should be
Hakim
@h4k1m0u
@amueller That's weird, I can't find out why in this short piece of code (https://bpaste.net/show/ORPVW) the centroids found by kmeans are sometimes located at the center of the 3 samples and sometimes not
Andreas Mueller
@amueller
you don't fix the random seed so the data changes
Hakim
@h4k1m0u
Oh sorry, I've completely forgotten the np.random above. Thanks a lot for reminding about that.
quant12345
@quant12345
When using "random forest" and "gradient boosting".
I add to the main signs, a sign that in the picture.
title
The data is clearly not stationary.
To make the series stationary, I apply a one-time difference to the data(increments)
After all, I normalize the data.
title
Why, if I don't use increments(one-time difference), then classes are separated better?
title
Although in all textbooks they write that non-stationary data should be decomposed into increments.
For training, I use the first 5000 characters. If you pay attention to the data, extreme values start after 5000.
That is, the model does not even see that such large values were in the training sample.
chet
@chetkhatri

I have two files that contain Event Name, Event City, Event Venue, Event State but in both files it's written in different ways or you can assume both the files are from different source.
I want to create a Machine learning-based algorithm that can do the matching.

I have tried with fuzzy-wuzzy to get string similarity.
Can anyone please tell me if I want to solve this with Deep Learning what would be the approach. Thanks @amueller

Hakim
@h4k1m0u
Hi, what does it mean when linear_model.Ridgereturns n_iter_ = None? does it mean it didn't even perform one single iteration?
Nicolas Hug
@NicolasHug

@h4k1m0u the doc says
niter : None or array of shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.

you're probably not using sag or lsqr?

Hakim
@h4k1m0u
Thanks @NicolasHug , you were right I was actually not even setting that parameter (solver='auto'). With solver=sag, it returns the # of iterations
hannah
@story645
Hi, semi random question but I can't find it in the docs - do y'all implement a consensus clustering evaluator that's not the bicluster one?
Antony Lee
@anntzer
Hi, I am the "maintainer" (more like caretaker) of hmmlearn (which was split out of sklearn a couple of years ago); I tried moving the CI to azure and realized that the macOS tests were failing (previously testing was only done on linux (travis) and windows (appveyor)) but can't test locally on macOS, would anyone be willing to have a look? https://dev.azure.com/anntzer/hmmlearn/_build/results?buildId=170 Thanks!
Nicolas Hug
@NicolasHug
@anntzer , running the test locally on my linux I get a bunch of zero division warnings on the failing test, so you might be able to debug locally still
Antony Lee
@anntzer
I get a single warning running tests locally but they still pass...
Nicolas Hug
@NicolasHug
yeah they pass but they probably should not. Unless you do expect to get a zero division in the test, in which case you need to protect the call.
the macOS CIs probably use different versions of numpy or scipy so that's why they fail while the others don't
Antony Lee
@anntzer
I'll look into it but it would be strange that different versions of numpy are being used