Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 04 19:35

    raphaelvallat on v0.2.9

    (compare)

  • Sep 04 19:18

    raphaelvallat on master

    Merge pull request #1 from raph… flake8 Updated doctest + docstring and 31 more (compare)

  • Sep 04 19:17

    raphaelvallat on develop

    Switch to 0.2.9 (compare)

  • Sep 03 15:43

    raphaelvallat on develop

    flake8 + changelog (compare)

  • Sep 03 15:29
    raphaelvallat closed #55
  • Sep 03 15:29
    raphaelvallat commented #55
  • Sep 03 15:28

    raphaelvallat on develop

    Add option for correcting for u… Merge pull request #64 from dom… (compare)

  • Sep 03 15:28
    raphaelvallat closed #64
  • Sep 03 10:35
    codecov[bot] commented #64
  • Sep 03 10:34
    codecov[bot] commented #64
  • Sep 03 10:29
    dominikstrb opened #64
  • Sep 03 10:29
    dominikstrb commented #55
  • Sep 03 05:37

    raphaelvallat on develop

    Added Sidak correction + change… (compare)

  • Sep 03 04:15
    raphaelvallat commented #55
  • Sep 03 04:13
    raphaelvallat closed #61
  • Sep 03 04:13
    raphaelvallat commented #61
  • Sep 03 04:09
    raphaelvallat closed #58
  • Sep 03 04:09
    raphaelvallat commented #58
  • Sep 02 22:40

    raphaelvallat on develop

    Minor change in changelog (compare)

  • Sep 02 17:10

    raphaelvallat on develop

    Replaced .loc by .at when possi… (compare)

Robert DeFilippi
@rrfd_twitter
@mmolet And because you're making a executable, it might be looking for that file
On another note, I'm going through this tutorial written by @raphaelvallat (https://raphaelvallat.com/pingouin.html) however the methods for the ANOVA are no longer working. I'm getting en error when the function is trying to access a pandas function pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() TypeError: 'NoneType' object is not callable . Any suggestions on what to do here?
Arthur Paulino
@arthurpaulino
Hello @rrfd_twitter!
I think it's better to report the bug on a new issue. Try to describe the exact steps to reproduce it, since the automated tests don't seem to be covering your case.
mmolet
@mmolet
Hi all, I got the following Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pingouin/parametric.py", line 177
warnings.warn("x and y have unequal sizes. Switching to "
UserWarning: x and y have unequal sizes. Switching to paired == False.; could you please tell me what the implications are? Is this automatically corrected ?
Raphael Vallat
@raphaelvallat
Hi @mmolet ! It seems that you are trying to run a paired T-test but with different sample sizes for x and y. Pingouin therefore automatically switched to an independent T-test (= no assumption that the data are repeated measures from the same individuals). Paired T-test have more statistical power than independent T-test (and therefore you should expect higher T-values if there is indeed an effect), but the results from the two tests should go in the same direction and be consistent. I'd recommend checking your input data, is it paired or not?
mmolet
@mmolet
hello @raphaelvallat , thank you for your response; yes it is paired, it is possible that I would get missing values when I would run the study, and I wanted to know how pinguin would behave by entering myself the data according to different scenarios (including missing values), obviously, I would like to keep a paired T-test, is there a way to deal with different sample sizes ?
Raphael Vallat
@raphaelvallat
@mmolet If you pass two arrays / lists of the same length to the paired T-test function, with one or both having missing values, Pingouin will choose a conservative approach and apply a listwise deletion of the missing values. In other words, Pingouin deletes the whole row that contains at least one missing values. However, in your case, this is a different problem. The problem is that x and y do not have the same size from the beginning, and therefore the data cannot be interpreted as paired. Do you see what I mean? I would check all the steps of your code to make sure that you are not removing some values at any point (careful, putting a missing value in a list is NOT the same as removing this value!). If you still cannot resolve your issue, please post your data and code here so that I can have a look. Thanks
daschoe
@daschoe
Hello everyone, I've tried the different examples for repeated measures ANOVA and the two-way repeated-measures ANOVA example from the documentation (https://pingouin-stats.org/generated/pingouin.rm_anova.html) doesn't work for me. I wonder if only I have this problem. I use Python 3.6.6, numpy 1.16.4, scipy 1.3.0, pandas 0.23.4, matplotlib 3.1.1 and seaborn 0.9.0.
Raphael Vallat
@raphaelvallat
Hi @daschoe ! Could you please attach a screenshot of the error? Also, please make sure to update your Pingouin version to 0.2.7 since I recently fixed a dependency issue with Scipy 1.3.0 (and statsmodels). Thanks
daschoe
@daschoe
@raphaelvallat I use Pingouin 0.2.7, sorry I forgot to mention.
error message
Raphael Vallat
@raphaelvallat
Hi @daschoe ! Now that's really weird, it works like a charm on my computer. Only difference is that I use Pandas 0.24.2,, can you please update Pandas and let me know if you still get the error? Also, do the other examples of the rm_anova function work? Thanks
daschoe
@daschoe
@raphaelvallat Updating Pandas helped. Now everything works, thank you! The one-way examples worked before, it's been just the two-way not working.
Raphael Vallat
@raphaelvallat
Great, I'll update the doc to make sure that users have pandas >= 0.24. Thank you!
Dan Nemrodov
@dannemro_twitter
Hello guys, first of all - this is an awesome package. Thanks. I have been looking for something like that for a long time. However, I have tried mixed ANOVA and it produces results very different from SPSS and R's ezanova, which are consistent between themselves. The F values are negative. How is it possible? I have one between variable, one within variable and different numbers of subjects in each category of the between variable. There are no missing values. For checks I used Type 2 sum of squares.
Raphael Vallat
@raphaelvallat
Hi @dannemro_twitter ! This is surprising since I have tested and validated the mixed_anova function against JASP and ezANOVA in several scenarios (including missing values, unbalanced data, etc, see here: https://github.com/raphaelvallat/pingouin/blob/11fa8e2379eacf89184f67d471c5477ac94f44c2/pingouin/tests/test_parametric.py#L194). But it would be impossible to test all possible situations, so perhaps there is something different and unique about the data that you are using. If that's an option, you can send me your data as well as a jupyter notebook and R script so that I can try to understand what's going on. My email is raphaelvallat9 at gmail dot com.
LegrandNico
@LegrandNico
Hi @raphaelvallat, indeed I have a similar problem with rm_anova() (crazy result and negative F values). You can download the data here, and the code to reproduce the error is:
import pandas as pd
import pingouin as pg

intrusions = pd.read_csv('Intrusions.csv', sep=';')

pg.rm_anova(data=intrusions[intrusions.Condition=='No-Think'],
                             subject='Subject',
                             within=['Emotion', 'Sessions'],
                             dv='Intrusions_per')
aov_ez() output is:
Anova Table (Type 3 tests)

Response: Intrusions_per
            Effect           df    MSE         F  ges p.value
1          Emotion  1.60, 43.18 593.70    3.82 * .007     .04
2         Sessions  3.21, 86.62 724.90 21.00 ***  .08  <.0001
3         Emotion:Sessions 7.41, 200.10 276.18      0.78 .003     .62
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

Sphericity correction method: GG
LegrandNico
@LegrandNico
For 'Emotion' Pingouin reports F = -2.67 and p = 1.0
Raphael Vallat
@raphaelvallat
Hi @LegrandNico ! So the issue for Dan was that the "Subject" was not properly coded to fit a mixed-design model. Therefore, I would suggest that you take a look at your "Subject" column and make sure that it really fits the design that you're trying to test (i.e. different subjects for between-factors but same subjects for within-factors). Also, try removing any non finite (missing, inf) values that you may have, both in the dependent variable or independent variables. I don't have the time to look more in details at the data today, but happy to have a look later this week if you don't find the cause by then! Thanks
LegrandNico
@LegrandNico
Alright, I found the error (my own)
LegrandNico
@LegrandNico
Sorry for this false alarm
Linden Parkes
@LindenParkes_twitter
Hi, I'm using the partial_corr function and am wondering if there is a way to incorporate multiple columns for y? I have a column in my dataframe, X, as well a covariate, x1, and I want to calculate the relationship between X and several other columns in my dataframe (Y1...Yn), each time partially out the effect of x1. Is there a way to do this without resorting to just looping over Y?
Raphael Vallat
@raphaelvallat
Hi @LindenParkes_twitter ! Yes, you need to use the https://pingouin-stats.org/generated/pingouin.pairwise_corr.html function: pg.pairwise_corr(data=df, columns=[['X'], ['Y1', 'Y2', ..., 'Yn']], covar=['x1'], method='pearson')
Let me know if that solved your issue, thanks!
Linden Parkes
@LindenParkes_twitter
Awesome, thank you! - is there any way to speed it up if I only care about the correlation value?
Raphael Vallat
@raphaelvallat
Hi @LindenParkes_twitter ! Not yet, this function is designed to handle a lot of cases but for this very reason is not the most efficient. I am planning to implement a "fast_pearson" mode in future release. If performance is really a concern, I suggest you look the Pingouin pcorr() function instead, less flexible but definitely faster
mmolet
@mmolet
Hello, I got the following message after running a 8*12 two-way ANOVA with repeated measures in both factors: Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pingouin/distribution.py", line 464
warnings.warn("Epsilon values might be innaccurate in "
UserWarning: Epsilon values might be innaccurate in two-way repeated measures design where each factor has more than 2 levels. Please double-check your results.__Any idea(s) about the implication(s)?
Raphael Vallat
@raphaelvallat
Hi @mmolet ! Yes, it means that you should be extra careful about the eps value of the interaction term in the output dataframe, and, by extension, to the Greenhouse-Geisser corrected p-value of the interaction term (p-GG-corr). Results are slightly different than R, and I haven't still figured out exactly why. That said, given the number of levels in both of your factors (8 and 12), I don't think an ANOVA is the most appropriate analysis here, instead, you might want to check linear mixed modeling, as implemented in the lme4 R package (no implementation yet in Python). Hope this helps! Thanks!
mmolet
@mmolet
Thanks for the information
Tama Handika
@blazetamareborn_gitlab

Hi @raphaelvallat & team,

First of all, thanks for creating this helpful library. I faced an error while using the ANCOVA function.

This is the error:
--> 168 set_use_numexpr(get_option("compute.use_numexpr"))
OptionError: "No such keys(s): 'compute.use_numexpr'"

My pandas is version '0.24.2'

Need your kind suggestion on this. Thanks!

Raphael Vallat
@raphaelvallat
Hi @blazetamareborn_gitlab ! How many covariates are you using in the ANCOVA? If more than one, Pingouin will make an internal call to statsmodels, in which case I am not sure I can be of great help. But, please can you start by 1) updating the following packages pip install --upgrade numpy statsmodels pandas pingouin, 2) check that your input dataframe looks "right", 3) try running the ANCOVA with only one covariate. Thanks
Tama Handika
@blazetamareborn_gitlab
Thanks for the help @raphaelvallat .
  1. Updated the packages you listed above, but the error still persists.
  2. These are the columns of the dataframe: customer_id group trx_cnt_pre trx_cnt_post
    I also made sure that each group has same number of customers.
  3. I run the function with only 1 covariate, which is trx_cnt_pre from the dataframe above
Tama Handika
@blazetamareborn_gitlab
Just to be clear:
  1. customer ID is mutually exclusive
  2. group is either treatment or control
  3. trx_cnt_pre is the # of trx a customer made, this will be the covariate
  4. trx_cnt_post is our DV
Raphael Vallat
@raphaelvallat
Hi @blazetamareborn_gitlab ! Can you try running pip install --upgrade numexpr then? If that still doesn't work can you please send me your data and code at raphaelvallat9 at gmail dot com, I'll try to have a look at it. Thanks
Jonathan Graesser
@Flydroid

Hi guys,
I just have to say thanks for wrapping statsmodels into such a great interface for N-Way ANOVA.
I have a the data from full factorical 3 factor 3 level experiment in a df and it just runs fine.
But I'm wondering if it does the same thing as

model = ols('Shearforce ~ C(US_Power)*C(BondForce)*C(BOndTime)', df_shear).fit()
sm.stats.anova_lm(model, typ= 2)

in the background?

Raphael Vallat
@raphaelvallat
Hi @Flydroid! Yes, that's exactly what it does! You can check the full code here: https://github.com/raphaelvallat/pingouin/blob/424a65e0dd0f4c4b64aa1c98012747adceff2975/pingouin/parametric.py#L1147 Thanks for your positive feedback, really appreciate it :)
Btw, you should be able to check the final formula of your output dataframe using print(aov.formula_) (formula_ is a hidden attribute of the dataframe and should be accessible for as long as you don't directly modify your output anova dataframe). Thanks!
Jonathan Graesser
@Flydroid
@raphaelvallat Thanks for your answer! That's nice hint with the formula_ - Maybe this would be good to add to the function docu?. The result is as expected, except each factor a second argument Sum added. Does that result in that all levels are checked?
Raphael Vallat
@raphaelvallat
Hi @Flydroid ! I added the Sum to be consistent with the statsmodels documentation (https://www.statsmodels.org/stable/anova.html), my understanding is that it changes the way that the Patsy design matrix is created, by forcing to use a sum (deviation) coding, as you can read here: https://www.statsmodels.org/devel/contrasts.html#sum-deviation-coding I've validated the results of the N-way anova against JASP: https://github.com/raphaelvallat/pingouin/blob/424a65e0dd0f4c4b64aa1c98012747adceff2975/pingouin/tests/test_parametric.py#L140 Hope this helps!
Jonathan Graesser
@Flydroid
Hi @raphaelvallat! Thanks for the hint, it seems to be correct way to do it.
I wanted to see the lm.summary() of the fit result but I can't edit the parametric.py succesfully as the import fails with
Traceback (most recent call last):
  File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-b1ff2590f725>", line 12, in <module>
    import pingouin as pg
  File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\__init__.py", line 7, in <module>
    from .equivalence import *
  File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\equivalence.py", line 5, in <module>
    from .parametric import ttest

  File "C:\Anaconda\envs\ThesisWireBond\lib\site-packages\pingouin\parametric.py", line 1155
    print(lm.summary())
                       ^
TabError: inconsistent use of tabs and spaces in indentation
Raphael Vallat
@raphaelvallat
Hi @Flydroid ! Did you fork and install the develop version of Pingouin? What software did you use to make the changes? I recommend using Atom, which automatically homogenize space and tabs on save. Thanks!
Aodarium
@Aodarium
Hi! I had to compute some repeated anova functions for my work and I used your library. It looks really clean and it's a pleasure to use.
During one of my computation (rm_anova), I got a negative F value. A bit surprised I would like to know if it's a bug or something normal? (in this case, any idea why?)
Raphael Vallat
@raphaelvallat
Hi @Aodarium ! Thanks for the feedback! This is most likely caused by an error in your dataframe, specifically on the Subject column that does not match the repeated measures design. Please have a look at previous messages in this thread (around Jul 10) for more details on this. Thanks!
Aodarium
@Aodarium
Hi @raphaelvallat, thank you for your answer, I'll give a look in this direction. Thanks!
RichardLeibbrandt
@RichardLeibbrandt

Hi! just wanted to say first of all, what a pleasure it is to work with Pingouin - really straightforward and sensible API that allows you to do what you want to do. Also appreciate all the attention and care that has gone into the documentation - the detailed explanations are great and super-useful.
I just have a question about pairwise_ttests(). I have a dataset with repeated measures on the same group of subjects in different conditions (condition is in the 'condid' column). The data failed normality tests, so I want a non-parametric test. And the design is unbalanced - has missing values for some subjects in some conditions. I've tried:
posthocs = pg.pairwise_ttests(dv='value', within='condid', subject='subid', data=df2, parametric=False, padjust='fdr_bh'),
expecting that this would call mwu() as a non-parametric test.
However, this fails, with an error raised from wilcoxon() instead of mwu().
Looking at the source for pairwise_ttests, I see you have the line (executed when parametric=False):
paired = True if contrast == 'simple_within' else False

And hence it tries to call wilcoxon(), then fails (I presume because of the unequal samples).
So it's true that this is a simple within - there is only the 'condid' factor within subjects, and no between factor. But there are missing values, so a paired test can't be used.
Just wondering why it works like this? I could just repeatedly call mwu and then do multicomp myself, but I guess I'm missing something more fundamental.

Raphael Vallat
@raphaelvallat
Hi @RichardLeibbrandt ! Thanks for the positive feedback, I really appreciate it! Since your data is paired (repeated measures), Pingouin will call the wilcoxon and not the mwu function. If you want to use the non-paired non-parametric test, then you can just use pairwise_ttest(dv='value', between='condid', data=df2, parametric=False, padjust='fdr_bh'). Can you please let me know what exactly is the error that you get in the wilcoxon function? In case of unbalanced repeated measures data, Pingouin will apply a listwise deletion, meaning removing all the rows for which there is at least one missing value. You might want to check your input data / the number of non-missing rows before running the function. Thanks!
RichardLeibbrandt
@RichardLeibbrandt

Thanks for that quick reply ! I should just clarify that I probably misstated what my data looks like: it's not really missing values so much as missing rows. So if a subject didn't take part in a condition, there actually is no row in the dataframe for that combination at all (I only have one dependent variable). To get the Wilcoxon test working with the listwise deletion behaviour you described, should I set up the data frame differently?
The error trace I get is

File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/pingouin/pairwise.py", line 272, in pairwise_ttests df_ttest = wilcoxon(x, y, tail=tail) File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/pingouin/nonparametric.py", line 407, in wilcoxon correction=True, alternative=tail) File "/Users/leib0006/anaconda2/envs/mseqpaper/lib/python3.7/site-packages/scipy/stats/morestats.py", line 2848, in wilcoxon raise ValueError('The samples x and y must have the same length.') ValueError: The samples x and y must have the same length.

And just to make sure I understand what you meant with the listwise deletion: Suppose we had
Condition A: Sub 1, 2, 3, 4
Condition B: Sub 1, 3, 4
Condition C: Sub 1, 2, 4
Does it do the list deletion on a per-comparison basis? i.e. compare A vs B with Subs 1, 3, 4; A vs C with 1, 2, 4, etc?
Or would it see that only Subs 1 and 4 have all conditions, and only use those two for all comparisons?

RichardLeibbrandt
@RichardLeibbrandt

OK, I have actually managed to answer my own questions - I've added rows with NaNs in them for all the missing values, and now wilcoxon works.
And I've stepped through the source code and I've seen that it does the second form of listwise deletion that I described.

So now I have a follow-up question: would it actually be valid to do it in the first, per-comparison way I described, where for each pairwise comparison, it finds the subjects that are in common for those two conditions only (rather than the subjects that occur in all conditions), and does that particular Wilcoxon test on the data from those subjects? (And then the next pairwise comparison might be on a different set of subjects. ) And then at the end do the correction for multiple comparisons withmulticomp? Obviously it's possible to do it with the software as multicomp doesn

doesn't care where p-values come from, but I'm wondering whether it's actually legitimate to do this?
Raphael Vallat
@raphaelvallat
Thanks @RichardLeibbrandt ! What you are describing is the pairwise deletion method. I think it would be great to indeed let the users choose between a strict listwise (a.k.a casewise) deletion or a more liberal pairwise deletion. I just opened an issue for that on GitHub: raphaelvallat/pingouin#56 I will try to implement that in a future release. Regarding the p-values, I would tend to think that multiple comparison correction after pairwise deletion is still valid, but you should try to find some relevant papers on this to make sure. Thanks!