Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 17:25
    Icyshaman commented #20308
  • 17:05
    glemaitre edited #20999
  • 17:05
    nimafanniasl commented #19137
  • 17:04
    nimafanniasl commented #19137
  • 16:58
    yuslepukhin commented #13678
  • 16:55
    caallinson edited #21458
  • 16:55
    caallinson commented #21458
  • 16:47
    glemaitre synchronize #20999
  • 16:43
    thomasjpfan commented #21465
  • 16:43
    thomasjpfan commented #21465
  • 16:41
    glemaitre synchronize #20999
  • 16:40
    ogrisel commented #20308
  • 16:38
    ogrisel edited #21350
  • 16:38
    ogrisel edited #21350
  • 16:37
    ogrisel edited #20308
  • 16:37
    ogrisel edited #20308
  • 16:37

    ogrisel on main

    DOC Ensures that SpectralCoclus… (compare)

  • 16:36
    ogrisel closed #21463
  • 16:34

    ogrisel on main

    [MRG]DOC Ensures that cross_val… (compare)

  • 16:34
    ogrisel closed #21464
Hakim
@h4k1m0u
Thanks @NicolasHug , you were right I was actually not even setting that parameter (solver='auto'). With solver=sag, it returns the # of iterations
hannah
@story645
Hi, semi random question but I can't find it in the docs - do y'all implement a consensus clustering evaluator that's not the bicluster one?
Antony Lee
@anntzer
Hi, I am the "maintainer" (more like caretaker) of hmmlearn (which was split out of sklearn a couple of years ago); I tried moving the CI to azure and realized that the macOS tests were failing (previously testing was only done on linux (travis) and windows (appveyor)) but can't test locally on macOS, would anyone be willing to have a look? https://dev.azure.com/anntzer/hmmlearn/_build/results?buildId=170 Thanks!
Nicolas Hug
@NicolasHug
@anntzer , running the test locally on my linux I get a bunch of zero division warnings on the failing test, so you might be able to debug locally still
Antony Lee
@anntzer
I get a single warning running tests locally but they still pass...
Nicolas Hug
@NicolasHug
yeah they pass but they probably should not. Unless you do expect to get a zero division in the test, in which case you need to protect the call.
the macOS CIs probably use different versions of numpy or scipy so that's why they fail while the others don't
Antony Lee
@anntzer
I'll look into it but it would be strange that different versions of numpy are being used
Antony Lee
@anntzer
wait, are you really getting zero division warnings from TestGMMHMMWithTiedCovars::test_fit_zero_variance (which is the failing test on osx)? I get only get warnings on TestGMMHMMWithDiagCovars::test_fit_zero_variance (another test)
duskotovilovic
@duskotovilovic

Dear, I i tried to tune hyperparameters of scikit GradientBoostingRegressor model using the Hyperopt optimizer. I set search space for learning_rate parameter in the range [0.01, 1] by many ways (for example : ""'learning_rate': hp.quniform('learning_rate', 0.01, 1, 0.05)"" or as simple array ""[0.01, 0.02, 0.03, 0.1]"") but when I run the code hyperopt start to calculation and I get the error " ValueError: learning_rate must be greater than 0 but was 0".

I do not know what is problem in the code because zero value is not in the parameter's scope. How zero value come to function?

Please help me to solve this problem.

Olivier Grisel
@ogrisel
This looks like a bug in hyperopt, no? Can you add print statements (or debugger breakpoint) in the hyperopt and scikit-learn code to check where this zero comes from?
Actually hp.quniform is for rounding to integer values. You probably want hp.loguniform(-3, 0) or someting similar.
Nicolas Hug
@NicolasHug

@anntzer I get an underflow for TestGMMHMMWithTiedCovars::test_fit_sparse_data and indeed most of the zero div warnings come from TestGMMHMMWithDiagCovars, not the tied version

Maybe addressing the existing warnings would fix the one that's failing?

Antony Lee
@anntzer
that's interesting, I don't get any warning with Tied::test_fit_sparse_data and you don't see it on Azure either; what's your numpy/scipy/anythingelse relevant version?
Nicolas Hug
@NicolasHug
System:
    python: 3.7.4 (default, Oct  4 2019, 06:57:26)  [GCC 9.2.0]
executable: /home/nico/.virtualenvs/sklearn/bin/python
   machine: Linux-5.3.1-arch1-1-ARCH-x86_64-with-arch

Python dependencies:
       pip: 19.0.3
setuptools: 40.8.0
   sklearn: 0.23.dev0
     numpy: 1.17.1
     scipy: 1.3.0
    Cython: 0.29.10
    pandas: 0.24.2
matplotlib: 3.0.0
    joblib: 0.13.2

Built with OpenMP: True
Here's my pytest output. I locally installed the master branch of hmmlearn
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithSphericalCovars::test_fit_zero_variance
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithTiedCovars::test_fit_sparse_data
  /home/nico/dev/hmmlearn/lib/hmmlearn/hmm.py:849: RuntimeWarning: underflow encountered in multiply
    post_comp_mix = post_comp[:, :, np.newaxis] * post_mix

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in log
    + np.dot(X ** 2, (1.0 / covars).T))

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in true_divide
    + np.dot(X ** 2, (1.0 / covars).T))

lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
  /home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: invalid value encountered in add
    + np.dot(X ** 2, (1.0 / covars).T))

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Results (20.38s):
      93 passed
       3 xpassed
      15 xfailed
Antony Lee
@anntzer
that's... curiouser and curiouser. I don't get the warnings with the exact same versions of everything (AFAICT, except that cpython is from conda), whether with pip-installed numpy and scipy or conda-forge ones.
Neha Gupta
@nehargupta
Hi, does anyone know about any plans that might exist involving the release cycle or timeline in moving IterativeImpute package out of it's experimental version? Thanks, I'm hoping to use it and it looks great for my use case!
albertopiva
@albertopiva
Screenshot 2019-12-28 14.29.36.png

Hi, I want to apply Multinomial Logistic Regression to compute winning probabilities for each contestant in my races.
The Data I want to feed in my model look like the image above.
I'm tring to understand how should I feed the target class to my model because every race can have a different number of runners, the target class for race A has 5 contestants, instead target class for race B has just 4 contestants.

Is there a way to model this using scikit-learn?

Adrin Jalali
@adrinjalali
@guptane6 we hope to fix some issues by the next release. But no guarantees
Guillaume Lemaitre
@glemaitre
I would probably found cool to have an entitled the blog post with something "Limitations and Caveats ..." instead of "What's wrong ...". This said I think that there are some criticisms that should be discussed by opening issues to come up with adequate solutions.
Andreas Mueller
@amueller
ugh they credit me as the creator of sklearn
Adrin Jalali
@adrinjalali
haha, yeah I saw :D THE creator :P to be fair, you're the sole maintainer contact on pypi (IIRC)
Andreas Mueller
@amueller
that means nothing lol
Guillaume Chevalier
@guillaume-chevalier

@amueller Thanks for the feedback haha! I'll edit the post soon to correct what you just pointed out. I sincerely thought you were the main creator of sklearn, as you are the top contributor, and also that you are very very involved. I'd love to know if there is anything I could do to help, or if you have any idea of things you'd like to see in Neuraxle to help with making sklearn more integrated in Deep Learning projects.

For instance, I think the following code snippet is really talkative as a way to do Deep Learning pipelines using the pipe and filter design pattern: https://www.neuraxle.org/stable/Neuraxle/README.html#deep-learning-pipelines

Would you have any ideas to share, or things you'd like to point out for me to work on next with Neuraxle?

@glemaitre "Limitations and Caveats ..." sounds cool! I could rename the article. I wanted it to catch the eye, seems like it worked hehe. I love sklearn tho :)
On my side, I've already fixed 95% of the issues I listed, in Neuraxle (as per the links to the Neuraxle website documentation for each problem listed).
Issues regarding serialization and hyperparameter search could be discussed, however.
I think that onnx-sklearn provide a nice way to deployed scikit-learn model in production
Adrin Jalali
@adrinjalali
yeah, that's the goal (onnx-sklearn), but it still needs a bit of work. I'm all in favor of focusing a bit on partial_fit (mini batches) though.
Guillaume Lemaitre
@glemaitre
but this is rather challenging to retrain models and update models across versions. This might not be in the scope of scikit-learn but having a third-library to manage those could be nice
@adrinjalali Incremental learning, early stopping, and callbacks are things which would be nice
they are in the roadmap I think
Adrin Jalali
@adrinjalali
yeah they are, they're just hard :P
Guillaume Lemaitre
@glemaitre
:) yes indeed
Guillaume Chevalier
@guillaume-chevalier
Nice to confirm that you scope scikit-learn like that. I feared I'd play a bit too much in your backyard but it seems fine, I'm glad you have this opinion. I'm totally down to make Neuraxle a way to handle all those callbacks and things required for doing deep learning, + serialization. I don't know about Onyx, but there could be a way that I adapt to that to save every neural net usign that instead of building custom savers. For now I'm doing 2 other libraries already: Neuraxle-TensorFlow and Neuraxle-PyTorch to provide default neural net savers to allow serialization and checkpointing and have those models have their special callbacks. Might also do Neuraxle-Keras and so forth.
Guillaume Lemaitre
@glemaitre
you also have keras-onnx
and pytorch-onnx
which manage the same way than sklearn-onnx
but this is for prediction only
Andreas Mueller
@amueller
They are for serialization and deployment, though. I think @gulliaume-chevalier wants training as well
lol ok you beat me to it ;)
Guillaume Chevalier
@guillaume-chevalier

I'll need to look into that. For now, with Neuraxle, someone could do this using 3 tf functions that builds tf graphs:

model = TensorflowV2ModelStep(
    create_model, create_loss, create_optimizer,
    has_expected_outputs=False
).set_hyperparams(hp).set_hyperparams_space(hps)

And I have savers that allows for saving and reloading and continue a fit (already!)

Same API would work for TF v1 using a TensorflowV1ModelStep instead, also PyTorch (using some nn.Modules), and eventually Keras in some ways
I also have a ParallelTransform class which uses the savers for parallelizing instead of using joblib. So all the pytorch, tf, and keras code is parallelizeable. I also am building right now a ClusteringWrapper which acts like the ParallelTransform using savers, but sends the saved wrapped pipeline over a worker that has a REST API. So the Clustering Wrapper can split a batch of data to N workers, by first sending the model, and then sending the data it splitted in parallel.
Guillaume Chevalier
@guillaume-chevalier
The same concept applies to a new StreamingPipeline class I'm creating right now :D it has the ability to have some steps (e.g.: sub-pipelines) run in different threads, and to have queues between each thread like a consumer-producer design pattern. I also already have a MiniBatchSequentialPipeline that just like a single-threaded Pipeline, but that already uses mini-batches, meaning that it splits the batches into mini batches, and it's just like having a normal Pipeline but calling .fit many times in a row (sorry, I didn't name it partial_fit, my fit is already thought of as potentially always a partial one.
Guillaume Chevalier
@guillaume-chevalier

@glemaitre You said:

@adrinjalali Incremental learning, early stopping, and callbacks are things which would be nice

If you look closely here, I already have incremental learning (e.g.: if you CTRL+F for the MiniBatchSequentialPipeline). I'd love to add early stopping and other callbacks soon, good idea. I opened an issue Neuraxio/Neuraxle#228 for such things, I'd add callbacks to it!

So in the issue Neuraxio/Neuraxle#228 I just linked to, there is some example API code, but it might not be enough. I'd like to really discover the good design patterns for that, although I at least found something that seems like it would work properly.
Keith
@DrEhrfurchtgebietend
Does HistGradientBoostingRegressor have an equivalent of subsample and max_features in GradientBoostingRegressor? I have a GradientBoostingRegressor model with tuned hyperparameters and I want to see if HistGradientBoostingRegressor is better