by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 12:18
    rth commented #17027
  • 12:17
    rth commented #17027
  • 12:15
    rth commented #17027
  • 12:09
    d3b0unce commented #8730
  • 12:08
    d3b0unce commented #8730
  • 12:06
    jnothman commented #17826
  • 11:24

    rth on master

    Added mean_absolute_percentage_… (compare)

  • 11:24
    rth commented #15007
  • 11:24
    rth closed #15007
  • 11:24
    rth closed #10708
  • 10:45
    lorentzenchr commented #17386
  • 10:43
    lorentzenchr synchronize #17386
  • 10:30
    abhimanyunegi edited #17832
  • 10:29
    abhimanyunegi opened #17832
  • 10:26
    lorentzenchr commented #16641
  • 10:21
    lorentzenchr commented #15176
  • 10:13
    lorentzenchr commented #16715
  • 10:08
    lorentzenchr commented #16791
  • 10:03
    rth commented #17734
  • 10:01
    lorentzenchr commented #17027
Loïc Estève
@lesteve
If you go there: https://github.com/JosephTLucas/scikit-learn/pulls and click "New Pull Request" it creates the PR against scikit-learn/scikit-learn from JosephTLucas fork and as you noticed I was not able to change that to use JosephTLucas fork as the target (edit it turns out clicking on the blue arrow between base and head fork is another work-around)...
Anaël Beaugnon
@ab-anssi
I have sent the pull request to Joseph. Thanks for help.
It is the first time I send a pull request to a contributor to update his pull request to the main repo. If he accepts my pull request (to the branch of the pull request), will it automatically update the pull request (to the main repo) with my commit ?
Loïc Estève
@lesteve
Short answer: yes. Longer answer: his PR is tied to his branch so as soon he merges your PR on his fork, his branch will be updated and your changes will appear on the PR to scikit-learn/scikit-learn.
Side-comment: if you collaborate from time to time with JosephTLucas a reasonable way to make that easier is that JosephTLucas gives you write access to his fork. This way you can push directly to his branch without doing a PR on his fork.
Anaël Beaugnon
@ab-anssi
@lesteve Thanks for the "side-comment". I will ask him if he agrees to do so. It would be much easier : )
Joseph Lucas
@JosephTLucas
Sounds like a plan!
C4rst3n
@C4rst3n
Hello, i hope this is the right place for my question
3 replies
i have a problem with a 1D-Classification.
I simplified my problem to the following:
There are 2 classes +1 and -1 and i have a trainvalue for each of the classes
0.0293294646367189 (class -1) and 0.025545042466768184 (class 1)
Now that i trained a LinearSVC with those values i throw some random values to the classifier to predict
i expect the decision limit to be in the center of the two example values
but even 0.025545042466768184, the train data for class 1 is predicted as -1
C4rst3n
@C4rst3n
i even tried to move this to a 2D Problem adding a 2nd feature to the values [0.0293294646367189, 0] and [0.025545042466768184, 0] but this didn't worked either
lesshaste
@lesshaste
I have some data I want to so linear regression on. When I use LinearRegression().fit(X_scaled, y[policy]) I get a score of over 0.96
when I use LassoLarsCV(cv=5).fit(X_scaled, y[policy]) I get a score of 0
what am I doing wrong?
Andreas Mueller
@amueller
what's the complete code? Training set score or test set score?
Francisco Palomares
@FranciscoPalomares
Hi, I have a large dataset, 600K rows and 2 columns of target. Multioutput xgboost works well, but Random Forest is so slow. How Can I perform it ?
Its a regression problem
imtejagst
@imtejagst_gitlab
i am writing a neural network without any external libraries for MNIST with only 4 labels.In the train _labels.csv file i have one hot encoded data(1000) for all samples, my doubt is how to call the data directly to the function in the code. iam using softmax as my activation function in output layer
Giuseppe Broccolo
@gbroccolo
Hi @FranciscoPalomares Random Forest is a collection of models which can be trained independently and can be parallelised: I guess for the classification you are using the RandomForestClassifier, you can scale the training through the n_jobs parameter when you instantiate it, something like
RandomForestClassifier(n_estimators=100, n_jobs=-1)
lesshaste
@lesshaste
does the default FeatureAgglomeration just leave you with 2 features?
Adrin Jalali
@adrinjalali
as stated in the docs, n_clusters=2 is the default value.
lesshaste
@lesshaste
does that mean you end up with 2 features?
I am not clear what a cluster means here
@adrinjalali (thank you)
Adrin Jalali
@adrinjalali
yes, the examples and the user guide will give you a better idea on how it works and what it does: https://scikit-learn.org/dev/modules/clustering.html#hierarchical-clustering
lesshaste
@lesshaste
thank you
I am surprised it works so well for my regression problem!
Andreas Mueller
@amueller
I think we could add it to the docstring as well. I think it's clear from the user guide but why not add it to the docstring?
Adrin Jalali
@adrinjalali
I agree that its docstring can be substantially improved :D the examples can also see some love
Joshua Newton
@joshuacwnewton
Hullo! I took on #9602 as a first issue for the MLH Fellowship. Looking into it further, it seems like the scope of the issue is much larger than clarifying a single docstring. I'm unsure of whether to start poking away at it, or whether to start a larger discussion on the direction of multiclass/multilabel learning in sklearn. Would love to hear thoughts. :)
Andreas Mueller
@amueller
hm honestly I think we're pretty consistent with the terms in the glossary
In some places, multiclass/multilabel functionality is explicit and indicated by the name of the function.
can you give an example of that?
that's for meta-estimators, I guess?
Joshua Newton
@joshuacwnewton
Ah, I guess I went a bit overboard with my comment, seeing an issue where there was none.
I'll focus on clarifying the OVR docstring, then.
Joshua Newton
@joshuacwnewton

Does anyone have any tips for making an API proposal go smoothly? (e.g. the structure of a good proposal, good examples of existing proposals)

Context: I was hoping to start working on a proposal for adding Gibbs sampling to LatentDirichletAllocation, going off of Thomas's recommendations.

Ah, just found the SLEP template.
Adrin Jalali
@adrinjalali
I think adding features like that is usually discussed in the issues @joshuacwnewton SLEPs have been a bit more on the general API rather than specific features
it can be just an issue like this one, for example scikit-learn/scikit-learn#15346
Joshua Newton
@joshuacwnewton
Ah, thanks for the clarification, @adrinjalali!
Beau-Yang
@Beau-Yang
Hello, the question I meet is the socres returned from the function 'cross_val_score' seem to be very different from the result of actual fitting process. This is the last training process where most of the valid loss are about 0.1. However the score I get from the 'cross_val_score' is about 0.49. The score I use is the 'neg mse' which is the similar to the loss function of the network 'mse'. I want to know why it happens and how to fix it. Thanks a lot.
image.png
spinningcat
@spinningcat
hola
Parthiv Chigurupati
@parthivc
Just submitted a new PR for the cross-validation documentation #17781
GFHuang
@GF-Huang
Hello, guys. How can I use a custom Distance Function for OPTICS clustering algorithm?
Andreas Mueller
@amueller
@GF-Huang you can use the metric parameter
Daniel Kaminski de Souza
@DanielAtKrypton
Upvote vscode issue to allow colorfull output when fitting with Sklearn. microsoft/vscode-python#12615
Ray Bell
@raybellwaves
Can I request I/we/someone works on getting this merged scikit-learn/scikit-learn#15007 as part of the SciPy sprint
Roman Yurchak
@rth
@raybellwaves That PR is now merged.