Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 23:30
    github-actions[bot] labeled #19251
  • 23:30
    kno10 opened #19251
  • 22:46
    sapientderek commented #19250
  • 22:40
    sapientderek labeled #19250
  • 22:40
    sapientderek opened #19250
  • 22:34
    alfaro96 commented #18782
  • 22:20
    henryiii commented #18782
  • 22:18
    amueller commented #17595
  • 21:38
    ryuwd commented #18934
  • 21:37
    glemaitre unlabeled #19248
  • 21:37
    glemaitre commented #19248
  • 21:32
    ryuwd commented #18934
  • 21:28
    janaknat commented #18782
  • 21:27
    ryuwd commented #18934
  • 21:18
    ryuwd commented #18934
  • 21:14
    ryuwd commented #18934
  • 21:13
    ryuwd commented #18934
  • 20:52
    github-actions[bot] labeled #19249
  • 20:52
    thomasjpfan opened #19249
  • 20:52
    kno10 labeled #19248
Guillaume Lemaitre
@glemaitre
:)
Eliseo Ortiz
@eliseo
hi there any clue for identify text from one document to another?
we are working on a prototype for fake news
qazi1002
@qazi1002
hi
anyone there to help
Giuseppe Broccolo
@gbroccolo
Hi @qazi1002 @eliseo looks like you need some NLP for this project...can you provide more info about which kind of help do you need? Are you meant to use specifically scikit-learn for this? Also, not sure that this is the proper place where to talk about this - topics should be strictly focused on scikit-learn development/bug fixing/etc.
Adrin Jalali
@adrinjalali
I don't think we have a strict policy for this channel being related to the dev only. But in the interest of the rest of the community being able to use the answers we give to your questions related to the usage, posting them on stackoverflow or other related forums may be more appropriate.
qazi1002
@qazi1002
@gbroccolo I need help regarding software development...as i am beginner so I want to get some tips for developing softwares... I want to develop software that reads the smart ID cards using card reader.
Hakim
@h4k1m0u
Hi there, is k-means clustering stochastic even when the initial centers are given? I'm noticing different results when I run my code multiple times
Andreas Mueller
@amueller
@h4k1m0u I don't think it should be
Hakim
@h4k1m0u
@amueller That's weird, I can't find out why in this short piece of code (https://bpaste.net/show/ORPVW) the centroids found by kmeans are sometimes located at the center of the 3 samples and sometimes not
Andreas Mueller
@amueller
you don't fix the random seed so the data changes
Hakim
@h4k1m0u
Oh sorry, I've completely forgotten the np.random above. Thanks a lot for reminding about that.
quant12345
@quant12345
When using "random forest" and "gradient boosting".
I add to the main signs, a sign that in the picture.
title
The data is clearly not stationary.
To make the series stationary, I apply a one-time difference to the data(increments)
After all, I normalize the data.
title
Why, if I don't use increments(one-time difference), then classes are separated better?
title
Although in all textbooks they write that non-stationary data should be decomposed into increments.
For training, I use the first 5000 characters. If you pay attention to the data, extreme values start after 5000.
That is, the model does not even see that such large values were in the training sample.
chet
@chetkhatri

I have two files that contain Event Name, Event City, Event Venue, Event State but in both files it's written in different ways or you can assume both the files are from different source.
I want to create a Machine learning-based algorithm that can do the matching.

I have tried with fuzzy-wuzzy to get string similarity.
Can anyone please tell me if I want to solve this with Deep Learning what would be the approach. Thanks @amueller

Hakim
@h4k1m0u
Hi, what does it mean when linear_model.Ridgereturns n_iter_ = None? does it mean it didn't even perform one single iteration?
Nicolas Hug
@NicolasHug

@h4k1m0u the doc says
niter : None or array of shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.

you're probably not using sag or lsqr?

Hakim
@h4k1m0u
Thanks @NicolasHug , you were right I was actually not even setting that parameter (solver='auto'). With solver=sag, it returns the # of iterations
hannah
@story645
Hi, semi random question but I can't find it in the docs - do y'all implement a consensus clustering evaluator that's not the bicluster one?
Antony Lee
@anntzer
Hi, I am the "maintainer" (more like caretaker) of hmmlearn (which was split out of sklearn a couple of years ago); I tried moving the CI to azure and realized that the macOS tests were failing (previously testing was only done on linux (travis) and windows (appveyor)) but can't test locally on macOS, would anyone be willing to have a look? https://dev.azure.com/anntzer/hmmlearn/_build/results?buildId=170 Thanks!
Nicolas Hug
@NicolasHug
@anntzer , running the test locally on my linux I get a bunch of zero division warnings on the failing test, so you might be able to debug locally still
Antony Lee
@anntzer
I get a single warning running tests locally but they still pass...
Nicolas Hug
@NicolasHug
yeah they pass but they probably should not. Unless you do expect to get a zero division in the test, in which case you need to protect the call.
the macOS CIs probably use different versions of numpy or scipy so that's why they fail while the others don't
Antony Lee
@anntzer
I'll look into it but it would be strange that different versions of numpy are being used
Antony Lee
@anntzer
wait, are you really getting zero division warnings from TestGMMHMMWithTiedCovars::test_fit_zero_variance (which is the failing test on osx)? I get only get warnings on TestGMMHMMWithDiagCovars::test_fit_zero_variance (another test)
duskotovilovic
@duskotovilovic

Dear, I i tried to tune hyperparameters of scikit GradientBoostingRegressor model using the Hyperopt optimizer. I set search space for learning_rate parameter in the range [0.01, 1] by many ways (for example : ""'learning_rate': hp.quniform('learning_rate', 0.01, 1, 0.05)"" or as simple array ""[0.01, 0.02, 0.03, 0.1]"") but when I run the code hyperopt start to calculation and I get the error " ValueError: learning_rate must be greater than 0 but was 0".

I do not know what is problem in the code because zero value is not in the parameter's scope. How zero value come to function?

Please help me to solve this problem.

Olivier Grisel
@ogrisel
This looks like a bug in hyperopt, no? Can you add print statements (or debugger breakpoint) in the hyperopt and scikit-learn code to check where this zero comes from?
Actually hp.quniform is for rounding to integer values. You probably want hp.loguniform(-3, 0) or someting similar.
Nicolas Hug
@NicolasHug

@anntzer I get an underflow for TestGMMHMMWithTiedCovars::test_fit_sparse_data and indeed most of the zero div warnings come from TestGMMHMMWithDiagCovars, not the tied version

Maybe addressing the existing warnings would fix the one that's failing?

Antony Lee
@anntzer
that's interesting, I don't get any warning with Tied::test_fit_sparse_data and you don't see it on Azure either; what's your numpy/scipy/anythingelse relevant version?
Nicolas Hug
@NicolasHug
System:
    python: 3.7.4 (default, Oct  4 2019, 06:57:26)  [GCC 9.2.0]
executable: /home/nico/.virtualenvs/sklearn/bin/python
   machine: Linux-5.3.1-arch1-1-ARCH-x86_64-with-arch

Python dependencies:
       pip: 19.0.3
setuptools: 40.8.0
   sklearn: 0.23.dev0
     numpy: 1.17.1
     scipy: 1.3.0
    Cython: 0.29.10
    pandas: 0.24.2
matplotlib: 3.0.0
    joblib: 0.13.2

Built with OpenMP: True
Here's my pytest output. I locally installed the master branch of hmmlearn
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithSphericalCovars::test_fit_zero_variance
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithTiedCovars::test_fit_sparse_data
  /home/nico/dev/hmmlearn/lib/hmmlearn/hmm.py:849: RuntimeWarning: underflow encountered in multiply
    post_comp_mix = post_comp[:, :, np.newaxis] * post_mix

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in log
    + np.dot(X ** 2, (1.0 / covars).T))

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in true_divide
    + np.dot(X ** 2, (1.0 / covars).T))

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: invalid value encountered in add
    + np.dot(X ** 2, (1.0 / covars).T))

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Results (20.38s):
      93 passed
       3 xpassed
      15 xfailed
Antony Lee
@anntzer
that's... curiouser and curiouser. I don't get the warnings with the exact same versions of everything (AFAICT, except that cpython is from conda), whether with pip-installed numpy and scipy or conda-forge ones.
Neha Gupta
@nehargupta
Hi, does anyone know about any plans that might exist involving the release cycle or timeline in moving IterativeImpute package out of it's experimental version? Thanks, I'm hoping to use it and it looks great for my use case!
albertopiva
@albertopiva
Screenshot 2019-12-28 14.29.36.png

Hi, I want to apply Multinomial Logistic Regression to compute winning probabilities for each contestant in my races.
The Data I want to feed in my model look like the image above.
I'm tring to understand how should I feed the target class to my model because every race can have a different number of runners, the target class for race A has 5 contestants, instead target class for race B has just 4 contestants.

Is there a way to model this using scikit-learn?

Adrin Jalali
@adrinjalali
@guptane6 we hope to fix some issues by the next release. But no guarantees
Guillaume Lemaitre
@glemaitre
I would probably found cool to have an entitled the blog post with something "Limitations and Caveats ..." instead of "What's wrong ...". This said I think that there are some criticisms that should be discussed by opening issues to come up with adequate solutions.
Andreas Mueller
@amueller
ugh they credit me as the creator of sklearn
Adrin Jalali
@adrinjalali
haha, yeah I saw :D THE creator :P to be fair, you're the sole maintainer contact on pypi (IIRC)
Andreas Mueller
@amueller
that means nothing lol
Guillaume Chevalier
@guillaume-chevalier

@amueller Thanks for the feedback haha! I'll edit the post soon to correct what you just pointed out. I sincerely thought you were the main creator of sklearn, as you are the top contributor, and also that you are very very involved. I'd love to know if there is anything I could do to help, or if you have any idea of things you'd like to see in Neuraxle to help with making sklearn more integrated in Deep Learning projects.

For instance, I think the following code snippet is really talkative as a way to do Deep Learning pipelines using the pipe and filter design pattern: https://www.neuraxle.org/stable/Neuraxle/README.html#deep-learning-pipelines

Would you have any ideas to share, or things you'd like to point out for me to work on next with Neuraxle?

@glemaitre "Limitations and Caveats ..." sounds cool! I could rename the article. I wanted it to catch the eye, seems like it worked hehe. I love sklearn tho :)
On my side, I've already fixed 95% of the issues I listed, in Neuraxle (as per the links to the Neuraxle website documentation for each problem listed).
Issues regarding serialization and hyperparameter search could be discussed, however.
I think that onnx-sklearn provide a nice way to deployed scikit-learn model in production
Adrin Jalali
@adrinjalali
yeah, that's the goal (onnx-sklearn), but it still needs a bit of work. I'm all in favor of focusing a bit on partial_fit (mini batches) though.
Guillaume Lemaitre
@glemaitre
but this is rather challenging to retrain models and update models across versions. This might not be in the scope of scikit-learn but having a third-library to manage those could be nice
@adrinjalali Incremental learning, early stopping, and callbacks are things which would be nice