Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 12:47
    khoney98 commented #199
  • 12:45
    khoney98 commented #199
  • 12:43
    khoney98 commented #199
  • Sep 27 17:33
    raphaelvallat commented #199
  • Sep 27 06:54
    khoney98 commented #199
  • Sep 24 18:47
    raphaelvallat commented #198
  • Sep 24 10:59
    krassowski commented #198
  • Sep 24 10:45
    krassowski commented #198
  • Sep 24 01:39
    raphaelvallat commented #198
  • Sep 24 01:38
    raphaelvallat commented #198
  • Sep 24 01:09
    raphaelvallat assigned #199
  • Sep 24 01:09
    raphaelvallat commented #199
  • Sep 24 01:05
    raphaelvallat labeled #199
  • Sep 23 20:57
    krassowski commented #198
  • Sep 23 20:48
    krassowski commented #198
  • Sep 23 20:45
    krassowski commented #198
  • Sep 23 20:45
    krassowski commented #198
  • Sep 23 20:44
    krassowski commented #198
  • Sep 23 20:42
    krassowski commented #198
  • Sep 23 13:55
    musicinmybrain commented #197
Raphael Vallat
@raphaelvallat
Note that it is actually possible to return the predicted values as a hidden attribute of the dataframe, but I think it's less confusing to say to Pingouin's users that if they want a fuller output they should just use as_dataframe=False to get a dictionary instead of a "summary" dataframe
Florin Andrei
@FlorinAndrei
with lm = pg.linear_regression() is there a simple predict() method like with the scikit models?
Raphael Vallat
@raphaelvallat
No because the linear regression is implemented as a function and not as Python class (like scikit learn)
Norbert Wilkens
@BikeNW
Hi Raphael, I just wanted to add my question to the current topic. I want to compute interaction and moderation with pingouin.linear_regression. This seems currently still not be possible. Is it intended to get implemtented? Thanks, Norbert
Raphael Vallat
@raphaelvallat
Hi @BikeNW, while this is not directly possible, I believe that you can create additional columns in your dataframe that would represent the contrast that you want to try (e.g. interaction between two terms). Otherwise, I'd recommend checking statsmodels if you'd like to have R-style formulas. Thanks!
MacAskill Lab
@macaskill-lab
hello! first - thank for the fantastic package! i have come across an issue and it may be because of my lack of skills, but cant work it out - it seems that for a mixed anova design you can't have multiple within subject comparisons. e.g. : pg.mixed_anova(dv='value', within=['variable_a', 'variable_b'], between = 'group', subject='subject', data=df, effsize="ng2") returns an dtype error. is this me being an idiot or is it a limitation? i couldnt find a direct mention of it anywhere so thought id check before deep diving
Raphael Vallat
@raphaelvallat
Hi @macaskill-lab! You're not an idiot: the mixed ANOVA only allows for one between + one within, or two within, or two between factors. It doesn't work with two within + one between, or two between + one within! For more complex ANOVA designs, I'd highly recommend using the free JASP software!
AtK42
@AtK42
hi all. I'm having some problems getting started with pingouin. I've installed the module via the cmd line and also via conda but when I try to import it in my jupyter notebook it says it cannot find the module, even though it has been successfully installed. any ideas as to why that's the case?
Raphael Vallat
@raphaelvallat
Hi @AtK42, are you sure that you have installed Pingouin in the same Python environment as your Jupyter Notebook? (the default is "base" in Anaconda)
AtK42
@AtK42
Hi @raphaelvallat , thanks for your reply. I did check and it was the case. In fact, it works now after I restarted my laptop (even though it still didn't work after restarting my jupyter kernel as you have suggested in your message from Jan. 19). So I have no idea how or why but it's all good now :)
Peach Cobbler 🍑
@johnmustin1_twitter
hi! i was hoping to get help with pingouin, sorry if this is a novice question but i can't seem to find it anywhere online
dose anyone know how to return a p-value? not the t-test function, i specifically want the p-value
Raphael Vallat
@raphaelvallat
Hi @johnmustin1_twitter, you want the p-value of a T-test? You can get it with this function: https://pingouin-stats.org/generated/pingouin.ttest.html#pingouin.ttest (the p-val column)
mmolet
@mmolet
Hi all, I thought you might be interested in knowing that Pingouin has been featured in "towardsdatascience.com"https://towardsdatascience.com/statistical-tests-t-test-andanova-674b242a5274
Irene Garcia-Marti
@Irene-GM_gitlab
Hi there! I would like to know if it is possible to tell pingouin to return the list of outliers found after the robust Shepherd's pi correlation. I checked the source code in master/pingouin/correlation.py, and the function seems to return an array of booleans, indicating whether they are an outlier or not. However, I suppose this function is wrapped in others, so what I get back in the end is the number of outliers in total. It would be useful to find who are those outliers and inspect them further. Is this possible? Thanks for the good work! :-)
Raphael Vallat
@raphaelvallat
Hi @Irene-GM_gitlab ! It is possible by directly calling the lower-level pingouin.correlation.shepherd function (https://github.com/raphaelvallat/pingouin/blob/b369de9b8607e55061bc408e534e7d44443c744a/pingouin/correlation.py#L146-L184): r, p, outliers = shepherd(x, y)
2 replies
juliatessler
@juliatessler
Hello, everyone!
I'm a novice user of Pingouin and I have a question regarding encoding of categorical variables in the anova method. I noticed that, once I add a column of categorical values (in str format), the code runs without errors, but I wasn't able to figure out what kind of encoding was applied to the data. Could you help me understanding that?
Raphael Vallat
@raphaelvallat
Hi @juliatessler, there's no particular encoding of the categorical variable in pingouin.anova, so it should work with different dtypes: int, str or even pd.Categorical
juliatessler
@juliatessler
I have noticed that it does work with whatever I choose :sweat_smile: I'm just curious about how it deals with categorical datae
(also, awesome work on Pingouin! I'm having a blast working with it)
Raphael Vallat
@raphaelvallat
Thanks :smiley: ! I haven't tested it extensively with pd.Categorical yet, but generally any kind of groupby operations on the between variable is done with: grp = data.groupby(between, observed=True) which ensure that categorical levels with no value will be excluded. I would however recommend using string whenever possible!
juliatessler
@juliatessler
:rocket:
Bettina Bustos
@bettinanicole
Hi everyone! I am trying to run a repeated measures anova that looks like this: aov = mixed_anova(dv='rt', between='experiment', within= ['congruent','PC'], subject='subid', data=df) but I am getting the error: "ValueError: Grouper and axis must be same length". Any ideas? Many thanks!
Raphael Vallat
@raphaelvallat
Hi @bettinanicole! Only one-way mixed design ANOVA are supported, i.e. one between factor and one within factor. Here you have two within factors which is why you get this error. For more complex ANOVA designs, I really recommend using the free software JASP. Thanks!
Bettina Bustos
@bettinanicole
Thank you loads -- eternally grateful for the package :)
xzllxls
@xzllxls
I am using Python 3.5, which version of pingouin can run based on Py3.5 ?
Raphael Vallat
@raphaelvallat
@xzllxls Pingouin is not tested anymore for Py3.5 so I cannot tell you exactly which function will fail and which will work, but my guess is that most basic functions should world (as long as you're using recent versions of pandas, numpy and scipy). I would however recommend upgrading your Python version to >3.7. Thanks
xzllxls
@xzllxls
@raphaelvallat OK, I see. Thanks!
Charlesepfl
@Charlesepfl
Hello, everyone! I am using multivariate_normality, the result is HZResults(hz=9600, pval=nan, normal=False), why does the 'nan' occur?Thanks in advance for your kindly help!
Raphael Vallat
@raphaelvallat
Hi @Charlesepfl, it's hard to say without seeing your original data. DO you have a lot of missing values in your input data perhaps?
Charlesepfl
@Charlesepfl
image.png
@raphaelvallat Hi, thanks for your reply. Actually, no data is missing.
@raphaelvallat the data is11790 rows × 2400 columns
Raphael Vallat
@raphaelvallat
Ah, then I think it's probably related to the fact that you have 2400 (!) columns... Can you try on a smaller subset of columns, i.e. 5-10? I don't think that this test was designed for such a large number of columns
Charlesepfl
@Charlesepfl
@raphaelvallat Thank you! Is there another way to test multivariate-normality for such a large number of columns?
Raphael Vallat
@raphaelvallat
@Charlesepfl I do not unfortunately. As you can see in this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927875/), multivariate normality tests are usually applied on data ranging from 2 to 10 columns (dimensions). Do you really need to test for multivariate normality?
Charlesepfl
@Charlesepfl
Thanks for your information! Best regards! @raphaelvallat
DForDespair
@DForDespair
does the pairwise corr function not work with nan values?
Raphael Vallat
@raphaelvallat
HI @DForDespair, the pairwise_corr function automatically removes NaN values in a pairwise fashion, i.e. separately for each correlation.
For more details see the nan_policy argument of the pairwise_corr function https://pingouin-stats.org/generated/pingouin.pairwise_corr.html
DForDespair
@DForDespair
hmm i was doing the pearson correlation and it said i cant use nan or info values. but when i changed to spearman it was fine. probably because i'm new to this. @raphaelvallat thanks a bunch
DForDespair
@DForDespair
also @raphaelvallat , is there a variance inflation factor that we can get or is that not offered.
csamuel11
@csamuel11
Hi, I would like to do the Bonferroni and Sidak multiple comparisons tests following two- and three-way ANOVA. Prism explains these tests at the following link: https://www.graphpad.com/guides/prism/latest/statistics/stat_the_method_of_bonferroni.htm. I have data in two columns (two genotypes) and want to compare them at each row (different levels of my factor). Can Pingouin do these tests?
Raphael Vallat
@raphaelvallat
@DForDespair the VIF is not currently implemented in Pingouin. It can be calculated in statsmodels however: https://www.statsmodels.org/stable/generated/statsmodels.stats.outliers_influence.variance_inflation_factor.html
@csamuel11 Three-way ANOVA are not supported in Pingouin. For 2-way ANOVA, you can use pingouin.pairwise_ttests to calculate the post-hoc pairwise T-test and use the padjust parameter to specificy no correction / bonferroni / sidak. For more complex ANOVA designs however, I strongly recommend using JASP instead of Pingouin. Hope this helps!
csamuel11
@csamuel11
Thank you! That helps.
csamuel11
@csamuel11
For the pingouin.pairwise_ttests, specifically for two between-subject factors, why does the order of the between subject factors matter? My two factors are genotype and sex, so how do I choose the order?
Raphael Vallat
@raphaelvallat
The order of the between subject factor matters for the interaction only. You should choose depending on which interaction order is the most relevant to your data. Alternatively, you can use JASP to calculate all possible pairwise combinations (full interaction as opposed to Pingouin which only reports a "partial" interaction).
Tao Xia
@XiaTaopsycho
Hi Raphael~, Thanks for making another strong and easy-to-use toolbox. Does Pingouin have any plan to implement the linear mixed model analysis?
Raphael Vallat
@raphaelvallat
Hi @XiaTaopsycho ! Please see: raphaelvallat/pingouin#103 Note that I don't anticipate that LMM will be implemented in the near future. In the meantime, I would recommend using statsmodels or pymer (https://eshinjolly.com/pymer4/)