- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

- Sep 04 19:35
raphaelvallat on v0.2.9

- Sep 04 19:18
raphaelvallat on master

Merge pull request #1 from raph… flake8 Updated doctest + docstring and 31 more (compare)

- Sep 04 19:17
raphaelvallat on develop

Switch to 0.2.9 (compare)

- Sep 03 15:43
raphaelvallat on develop

flake8 + changelog (compare)

- Sep 03 15:29raphaelvallat closed #55
- Sep 03 15:29raphaelvallat commented #55
- Sep 03 15:28
raphaelvallat on develop

Add option for correcting for u… Merge pull request #64 from dom… (compare)

- Sep 03 15:28raphaelvallat closed #64
- Sep 03 10:35codecov[bot] commented #64
- Sep 03 10:34codecov[bot] commented #64
- Sep 03 10:29dominikstrb opened #64
- Sep 03 10:29dominikstrb commented #55
- Sep 03 05:37
raphaelvallat on develop

Added Sidak correction + change… (compare)

- Sep 03 04:15raphaelvallat commented #55
- Sep 03 04:13raphaelvallat closed #61
- Sep 03 04:13raphaelvallat commented #61
- Sep 03 04:09raphaelvallat closed #58
- Sep 03 04:09raphaelvallat commented #58
- Sep 02 22:40
raphaelvallat on develop

Minor change in changelog (compare)

- Sep 02 17:10
raphaelvallat on develop

Replaced .loc by .at when possi… (compare)

On another note, I'm going through this tutorial written by @raphaelvallat (https://raphaelvallat.com/pingouin.html) however the methods for the ANOVA are no longer working. I'm getting en error when the function is trying to access a pandas function

`pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() TypeError: 'NoneType' object is not callable`

. Any suggestions on what to do here?
Hello @rrfd_twitter!

I think it's better to report the bug on a new issue. Try to describe the exact steps to reproduce it, since the automated tests don't seem to be covering your case.

I think it's better to report the bug on a new issue. Try to describe the exact steps to reproduce it, since the automated tests don't seem to be covering your case.

Hi all, I got the following Warning (from warnings module):

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pingouin/parametric.py", line 177

warnings.warn("x and y have unequal sizes. Switching to "

UserWarning: x and y have unequal sizes. Switching to paired == False.; could you please tell me what the implications are? Is this automatically corrected ?

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pingouin/parametric.py", line 177

warnings.warn("x and y have unequal sizes. Switching to "

UserWarning: x and y have unequal sizes. Switching to paired == False.; could you please tell me what the implications are? Is this automatically corrected ?

Hi @mmolet ! It seems that you are trying to run a paired T-test but with different sample sizes for

`x`

and `y`

. Pingouin therefore automatically switched to an independent T-test (= no assumption that the data are repeated measures from the same individuals). Paired T-test have more statistical power than independent T-test (and therefore you should expect higher T-values if there is indeed an effect), but the results from the two tests should go in the same direction and be consistent. I'd recommend checking your input data, is it paired or not?
hello @raphaelvallat , thank you for your response; yes it is paired, it is possible that I would get missing values when I would run the study, and I wanted to know how pinguin would behave by entering myself the data according to different scenarios (including missing values), obviously, I would like to keep a paired T-test, is there a way to deal with different sample sizes ?

@mmolet If you pass two arrays / lists of the same length to the paired T-test function, with one or both having missing values, Pingouin will choose a conservative approach and apply a listwise deletion of the missing values. In other words, Pingouin deletes the whole row that contains at least one missing values. However, in your case, this is a different problem. The problem is that

`x`

and `y`

do not have the same size from the beginning, and therefore the data cannot be interpreted as paired. Do you see what I mean? I would check all the steps of your code to make sure that you are not removing some values at any point (careful, putting a missing value in a list is NOT the same as removing this value!). If you still cannot resolve your issue, please post your data and code here so that I can have a look. Thanks
Hello everyone, I've tried the different examples for repeated measures ANOVA and the two-way repeated-measures ANOVA example from the documentation (https://pingouin-stats.org/generated/pingouin.rm_anova.html) doesn't work for me. I wonder if only I have this problem. I use Python 3.6.6, numpy 1.16.4, scipy 1.3.0, pandas 0.23.4, matplotlib 3.1.1 and seaborn 0.9.0.

Hello guys, first of all - this is an awesome package. Thanks. I have been looking for something like that for a long time. However, I have tried mixed ANOVA and it produces results very different from SPSS and R's ezanova, which are consistent between themselves. The F values are negative. How is it possible? I have one between variable, one within variable and different numbers of subjects in each category of the between variable. There are no missing values. For checks I used Type 2 sum of squares.

Hi @dannemro_twitter ! This is surprising since I have tested and validated the mixed_anova function against JASP and ezANOVA in several scenarios (including missing values, unbalanced data, etc, see here: https://github.com/raphaelvallat/pingouin/blob/11fa8e2379eacf89184f67d471c5477ac94f44c2/pingouin/tests/test_parametric.py#L194). But it would be impossible to test all possible situations, so perhaps there is something different and unique about the data that you are using. If that's an option, you can send me your data as well as a jupyter notebook and R script so that I can try to understand what's going on. My email is raphaelvallat9 at gmail dot com.

Hi @raphaelvallat, indeed I have a similar problem with rm_anova() (crazy result and negative F values). You can download the data here, and the code to reproduce the error is:

```
import pandas as pd
import pingouin as pg
intrusions = pd.read_csv('Intrusions.csv', sep=';')
pg.rm_anova(data=intrusions[intrusions.Condition=='No-Think'],
subject='Subject',
within=['Emotion', 'Sessions'],
dv='Intrusions_per')
```

aov_ez() output is:

```
Anova Table (Type 3 tests)
Response: Intrusions_per
Effect df MSE F ges p.value
1 Emotion 1.60, 43.18 593.70 3.82 * .007 .04
2 Sessions 3.21, 86.62 724.90 21.00 *** .08 <.0001
3 Emotion:Sessions 7.41, 200.10 276.18 0.78 .003 .62
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
Sphericity correction method: GG
```

Hi @LegrandNico ! So the issue for Dan was that the "Subject" was not properly coded to fit a mixed-design model. Therefore, I would suggest that you take a look at your "Subject" column and make sure that it really fits the design that you're trying to test (i.e. different subjects for between-factors but same subjects for within-factors). Also, try removing any non finite (missing, inf) values that you may have, both in the dependent variable or independent variables. I don't have the time to look more in details at the data today, but happy to have a look later this week if you don't find the cause by then! Thanks

Hi, I'm using the partial_corr function and am wondering if there is a way to incorporate multiple columns for y? I have a column in my dataframe, X, as well a covariate, x1, and I want to calculate the relationship between X and several other columns in my dataframe (Y1...Yn), each time partially out the effect of x1. Is there a way to do this without resorting to just looping over Y?

Hi @LindenParkes_twitter ! Yes, you need to use the https://pingouin-stats.org/generated/pingouin.pairwise_corr.html function:

`pg.pairwise_corr(data=df, columns=[['X'], ['Y1', 'Y2', ..., 'Yn']], covar=['x1'], method='pearson')`

Let me know if that solved your issue, thanks!

Hi @LindenParkes_twitter ! Not yet, this function is designed to handle a lot of cases but for this very reason is not the most efficient. I am planning to implement a "fast_pearson" mode in future release. If performance is really a concern, I suggest you look the Pingouin pcorr() function instead, less flexible but definitely faster

Hello, I got the following message after running a 8*12 two-way ANOVA with repeated measures in both factors: Warning (from warnings module):

File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pingouin/distribution.py", line 464

warnings.warn("Epsilon values might be innaccurate in "

UserWarning: Epsilon values might be innaccurate in two-way repeated measures design where each factor has more than 2 levels. Please double-check your results.__Any idea(s) about the implication(s)?

File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pingouin/distribution.py", line 464

warnings.warn("Epsilon values might be innaccurate in "

UserWarning: Epsilon values might be innaccurate in two-way repeated measures design where each factor has more than 2 levels. Please double-check your results.__Any idea(s) about the implication(s)?

Hi @mmolet ! Yes, it means that you should be extra careful about the

`eps`

value of the interaction term in the output dataframe, and, by extension, to the Greenhouse-Geisser corrected p-value of the interaction term (`p-GG-corr`

). Results are slightly different than R, and I haven't still figured out exactly why. That said, given the number of levels in both of your factors (8 and 12), I don't think an ANOVA is the most appropriate analysis here, instead, you might want to check linear mixed modeling, as implemented in the lme4 R package (no implementation yet in Python). Hope this helps! Thanks!
Hi @raphaelvallat & team,

First of all, thanks for creating this helpful library. I faced an error while using the ANCOVA function.

This is the error:

--> 168 set_use_numexpr(get_option("compute.use_numexpr"))

OptionError: "No such keys(s): 'compute.use_numexpr'"

My pandas is version '0.24.2'

Need your kind suggestion on this. Thanks!

Hi @blazetamareborn_gitlab ! How many covariates are you using in the ANCOVA? If more than one, Pingouin will make an internal call to statsmodels, in which case I am not sure I can be of great help. But, please can you start by 1) updating the following packages

`pip install --upgrade numpy statsmodels pandas pingouin`

, 2) check that your input dataframe looks "right", 3) try running the ANCOVA with only one covariate. Thanks
Thanks for the help @raphaelvallat .

- Updated the packages you listed above, but the error still persists.
- These are the columns of the dataframe: customer_id group trx_cnt_pre trx_cnt_post

I also made sure that each group has same number of customers. - I run the function with only 1 covariate, which is trx_cnt_pre from the dataframe above

Hi guys,

I just have to say thanks for wrapping statsmodels into such a great interface for N-Way ANOVA.

I have a the data from full factorical 3 factor 3 level experiment in a df and it just runs fine.

But I'm wondering if it does the same thing as

```
model = ols('Shearforce ~ C(US_Power)*C(BondForce)*C(BOndTime)', df_shear).fit()
sm.stats.anova_lm(model, typ= 2)
```

in the background?

Hi @Flydroid! Yes, that's exactly what it does! You can check the full code here: https://github.com/raphaelvallat/pingouin/blob/424a65e0dd0f4c4b64aa1c98012747adceff2975/pingouin/parametric.py#L1147 Thanks for your positive feedback, really appreciate it :)

Btw, you should be able to check the final formula of your output dataframe using

`print(aov.formula_)`

(`formula_`

is a hidden attribute of the dataframe and should be accessible for as long as you don't directly modify your output anova dataframe). Thanks!
Hi @Flydroid ! I added the

`Sum`

to be consistent with the statsmodels documentation (https://www.statsmodels.org/stable/anova.html), my understanding is that it changes the way that the Patsy design matrix is created, by forcing to use a sum (deviation) coding, as you can read here: https://www.statsmodels.org/devel/contrasts.html#sum-deviation-coding I've validated the results of the N-way anova against JASP: https://github.com/raphaelvallat/pingouin/blob/424a65e0dd0f4c4b64aa1c98012747adceff2975/pingouin/tests/test_parametric.py#L140 Hope this helps!
Hi @raphaelvallat! Thanks for the hint, it seems to be correct way to do it.

I wanted to see the lm.summary() of the fit result but I can't edit the parametric.py succesfully as the import fails with

I wanted to see the lm.summary() of the fit result but I can't edit the parametric.py succesfully as the import fails with

```
Traceback (most recent call last):
File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-b1ff2590f725>", line 12, in <module>
import pingouin as pg
File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\__init__.py", line 7, in <module>
from .equivalence import *
File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\equivalence.py", line 5, in <module>
from .parametric import ttest
File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\parametric.py", line 1155
print(lm.summary())
^
TabError: inconsistent use of tabs and spaces in indentation
```

Hi! I had to compute some repeated anova functions for my work and I used your library. It looks really clean and it's a pleasure to use.

During one of my computation (rm_anova), I got a negative F value. A bit surprised I would like to know if it's a bug or something normal? (in this case, any idea why?)

During one of my computation (rm_anova), I got a negative F value. A bit surprised I would like to know if it's a bug or something normal? (in this case, any idea why?)

Hi @Aodarium ! Thanks for the feedback! This is most likely caused by an error in your dataframe, specifically on the

`Subject`

column that does not match the repeated measures design. Please have a look at previous messages in this thread (around Jul 10) for more details on this. Thanks!
Hi! just wanted to say first of all, what a pleasure it is to work with Pingouin - really straightforward and sensible API that allows you to do what you want to do. Also appreciate all the attention and care that has gone into the documentation - the detailed explanations are great and super-useful.

I just have a question about pairwise_ttests(). I have a dataset with repeated measures on the same group of subjects in different conditions (condition is in the 'condid' column). The data failed normality tests, so I want a non-parametric test. And the design is unbalanced - has missing values for some subjects in some conditions. I've tried:

posthocs = pg.pairwise_ttests(dv='value', within='condid', subject='subid', data=df2, parametric=False, padjust='fdr_bh'),

expecting that this would call mwu() as a non-parametric test.

However, this fails, with an error raised from wilcoxon() instead of mwu().

Looking at the source for pairwise_ttests, I see you have the line (executed when parametric=False):

paired = True if contrast == 'simple_within' else False

And hence it tries to call wilcoxon(), then fails (I presume because of the unequal samples).

So it's true that this is a simple within - there is only the 'condid' factor within subjects, and no between factor. But there are missing values, so a paired test can't be used.

Just wondering why it works like this? I could just repeatedly call mwu and then do multicomp myself, but I guess I'm missing something more fundamental.

Hi @RichardLeibbrandt ! Thanks for the positive feedback, I really appreciate it! Since your data is paired (repeated measures), Pingouin will call the

`wilcoxon`

and not the `mwu`

function. If you want to use the non-paired non-parametric test, then you can just use `pairwise_ttest(dv='value', between='condid', data=df2, parametric=False, padjust='fdr_bh')`

. Can you please let me know what exactly is the error that you get in the `wilcoxon`

function? In case of unbalanced repeated measures data, Pingouin will apply a listwise deletion, meaning removing all the rows for which there is at least one missing value. You might want to check your input data / the number of non-missing rows before running the function. Thanks!
Thanks for that quick reply ! I should just clarify that I probably misstated what my data looks like: it's not really missing values so much as missing rows. So if a subject didn't take part in a condition, there actually is no row in the dataframe for that combination at all (I only have one dependent variable). To get the Wilcoxon test working with the listwise deletion behaviour you described, should I set up the data frame differently?

The error trace I get is

```
File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/pingouin/pairwise.py", line 272, in pairwise_ttests
df_ttest = wilcoxon(x, y, tail=tail)
File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/pingouin/nonparametric.py", line 407, in wilcoxon
correction=True, alternative=tail)
File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/scipy/stats/morestats.py", line 2848, in wilcoxon
raise ValueError('The samples x and y must have the same length.')
ValueError: The samples x and y must have the same length.
```

And just to make sure I understand what you meant with the listwise deletion: Suppose we had

Condition A: Sub 1, 2, 3, 4

Condition B: Sub 1, 3, 4

Condition C: Sub 1, 2, 4

Does it do the list deletion on a per-comparison basis? i.e. compare A vs B with Subs 1, 3, 4; A vs C with 1, 2, 4, etc?

Or would it see that only Subs 1 and 4 have all conditions, and only use those two for all comparisons?

OK, I have actually managed to answer my own questions - I've added rows with NaNs in them for all the missing values, and now `wilcoxon`

works.

And I've stepped through the source code and I've seen that it does the second form of listwise deletion that I described.

So now I have a follow-up question: would it actually be valid to do it in the first, per-comparison way I described, where for each pairwise comparison, it finds the subjects that are in common for those two conditions only (rather than the subjects that occur in all conditions), and does that particular Wilcoxon test on the data from those subjects? (And then the next pairwise comparison might be on a different set of subjects. ) And then at the end do the correction for multiple comparisons with`multicomp`

? Obviously it's possible to do it with the software as `multicomp`

doesn

doesn't care where p-values come from, but I'm wondering whether it's actually legitimate to do this?

Thanks @RichardLeibbrandt ! What you are describing is the pairwise deletion method. I think it would be great to indeed let the users choose between a strict listwise (a.k.a casewise) deletion or a more liberal pairwise deletion. I just opened an issue for that on GitHub: raphaelvallat/pingouin#56 I will try to implement that in a future release. Regarding the p-values, I would tend to think that multiple comparison correction after pairwise deletion is still valid, but you should try to find some relevant papers on this to make sure. Thanks!