scikit-learn: machine learning in Python. Please feel free to ask specific questions about scikit-learn. Please try to keep the discussion focused on scikit-learn usage and immediately related open source projects from the Python ecosystem.
ogrisel on main
Change assert_raises to pytest_… (compare)
I have two files that contain Event Name, Event City, Event Venue, Event State but in both files it's written in different ways or you can assume both the files are from different source.
I want to create a Machine learning-based algorithm that can do the matching.
I have tried with fuzzy-wuzzy to get string similarity.
Can anyone please tell me if I want to solve this with Deep Learning what would be the approach. Thanks @amueller
Dear, I i tried to tune hyperparameters of scikit GradientBoostingRegressor model using the Hyperopt optimizer. I set search space for learning_rate parameter in the range [0.01, 1] by many ways (for example : ""'learning_rate': hp.quniform('learning_rate', 0.01, 1, 0.05)"" or as simple array ""[0.01, 0.02, 0.03, 0.1]"") but when I run the code hyperopt start to calculation and I get the error " ValueError: learning_rate must be greater than 0 but was 0".
I do not know what is problem in the code because zero value is not in the parameter's scope. How zero value come to function?
Please help me to solve this problem.
hp.loguniform(-3, 0)
or someting similar.
System:
python: 3.7.4 (default, Oct 4 2019, 06:57:26) [GCC 9.2.0]
executable: /home/nico/.virtualenvs/sklearn/bin/python
machine: Linux-5.3.1-arch1-1-ARCH-x86_64-with-arch
Python dependencies:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.23.dev0
numpy: 1.17.1
scipy: 1.3.0
Cython: 0.29.10
pandas: 0.24.2
matplotlib: 3.0.0
joblib: 0.13.2
Built with OpenMP: True
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithSphericalCovars::test_fit_zero_variance
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithTiedCovars::test_fit_sparse_data
/home/nico/dev/hmmlearn/lib/hmmlearn/hmm.py:849: RuntimeWarning: underflow encountered in multiply
post_comp_mix = post_comp[:, :, np.newaxis] * post_mix
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
/home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in log
+ np.dot(X ** 2, (1.0 / covars).T))
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
/home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: divide by zero encountered in true_divide
+ np.dot(X ** 2, (1.0 / covars).T))
lib/hmmlearn/tests/test_gmm_hmm_new.py::TestGMMHMMWithDiagCovars::test_fit_zero_variance
/home/nico/dev/hmmlearn/lib/hmmlearn/stats.py:47: RuntimeWarning: invalid value encountered in add
+ np.dot(X ** 2, (1.0 / covars).T))
-- Docs: https://docs.pytest.org/en/latest/warnings.html
Results (20.38s):
93 passed
3 xpassed
15 xfailed
Hi, I want to apply Multinomial Logistic Regression to compute winning probabilities for each contestant in my races.
The Data I want to feed in my model look like the image above.
I'm tring to understand how should I feed the target class to my model because every race can have a different number of runners, the target class for race A has 5 contestants, instead target class for race B has just 4 contestants.
Is there a way to model this using scikit-learn?
@amueller Thanks for the feedback haha! I'll edit the post soon to correct what you just pointed out. I sincerely thought you were the main creator of sklearn, as you are the top contributor, and also that you are very very involved. I'd love to know if there is anything I could do to help, or if you have any idea of things you'd like to see in Neuraxle to help with making sklearn more integrated in Deep Learning projects.
For instance, I think the following code snippet is really talkative as a way to do Deep Learning pipelines using the pipe and filter design pattern: https://www.neuraxle.org/stable/Neuraxle/README.html#deep-learning-pipelines
Would you have any ideas to share, or things you'd like to point out for me to work on next with Neuraxle?
I'll need to look into that. For now, with Neuraxle, someone could do this using 3 tf functions that builds tf graphs:
model = TensorflowV2ModelStep(
create_model, create_loss, create_optimizer,
has_expected_outputs=False
).set_hyperparams(hp).set_hyperparams_space(hps)
And I have savers that allows for saving and reloading and continue a fit (already!)
nn.Module
s), and eventually Keras in some ways
ParallelTransform
class which uses the savers for parallelizing instead of using joblib. So all the pytorch, tf, and keras code is parallelizeable. I also am building right now a ClusteringWrapper
which acts like the ParallelTransform using savers, but sends the saved wrapped pipeline over a worker that has a REST API. So the Clustering Wrapper can split a batch of data to N workers, by first sending the model, and then sending the data it splitted in parallel.