Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 13 02:16

    raphaelvallat on develop

    Removed AppVeyor badges (compare)

  • Oct 13 02:10

    raphaelvallat on develop

    Add --cov to GH Actions (compare)

  • Oct 10 17:24
    raphaelvallat commented #138
  • Oct 10 17:11
    raphaelvallat closed #140
  • Oct 10 17:11
    raphaelvallat commented #140
  • Oct 10 14:01
    caofeizhen edited #140
  • Oct 10 14:00
    caofeizhen closed #141
  • Oct 10 14:00
    caofeizhen commented #140
  • Oct 10 13:58
    caofeizhen opened #141
  • Oct 10 13:58
    caofeizhen opened #140
  • Oct 10 08:27
    sappelhoff closed #130
  • Oct 10 08:27
    sappelhoff commented #130
  • Oct 09 19:15
    achennings commented #138
  • Oct 09 16:50
    achennings commented #138
  • Oct 08 23:26
    raphaelvallat commented #138
  • Oct 08 23:18
    raphaelvallat labeled #138
  • Oct 08 23:18
    raphaelvallat assigned #138
  • Oct 08 23:17
    raphaelvallat labeled #139
  • Oct 08 23:17
    raphaelvallat assigned #139
  • Oct 08 23:17
    raphaelvallat opened #139
Bhawnaawasthi96
@Bhawnaawasthi96

@raphaelvallat ,Thanks for your solution. this is my code still im getting same error .this is my code-:pip install pandas
pip install pingouin
pip install xlrd
import pandas as pd

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

file = r'C:\Users\rock\Desktop\lance\Initial Data.xls'

df = pd.read_excel(file)
print(df)

import numpy as np
import pingouin as pg
data = pdata = pg.read_dataset('df')
icc= pg.intraclass_corr(data=data, targets='Question', raters='test_rater_1',
ratings='correct_rating')

print(icc)

note : In my dataset i have questions, ratesrs, ratings and correct_ratings.

Bhawnaawasthi96
@Bhawnaawasthi96
image.png
micemagic
@micemagic

Hello everyone. I am really happy to find such a great and straight-forward package for statistics, if you contributed to it then I'd like to thank you.

I have a question regarding the post-hoc tests. I am performing an analysis using a mixed anova (between: treatment/placebo, within: first_day/second_day....). I wish to conduct a pairwise comparison, with bonferroni correction, but I fail to do so in pingouin. Since I am using a strict adjustment, I would like to minimize the number of posthocs, so for five days, I want to perform 5 comparisons:
placebo, first day VS experimental first day
placebo, second day VS experimental second day
....
placebo, fifth day VS experimental fifth day

When I use defaults in .pairwise_ttests, It performs all possible combination of comparisons, and most likely adjust the p value, for their number...

pingouin version: 0.3.7
micemagic
@micemagic
oh, and my data is in a tidy format
Raphael Vallat
@raphaelvallat
Hi @micemagic! Glad you like Pingouin! When you use a mixed ANOVA design in pairwise T-test, Pingouin will first calculate the pairwise comparison of the within-subject factor, then the between-subject factor, and finally the within x between interaction, or between x within interaction if within_first=False. Importantly, the p-values are corrected separately for each of these three effects: within, between and interaction. Now, if you want to apply the p-values correction only on a subset of comparisons, I'd suggest you get the uncorrected p-values from the output dataframe, select only the ones that you want, and then use the pingouin.multicomp function to correct for multiple comparisons. Hope that makes sense. Thanks!
micemagic
@micemagic
@raphaelvallat thank you for rapid response and for your efforts put into pingouin. I'll use your advice. Keep up your great work!
bilehtin
@bilehtin

Hi everyone, I'm using the pairwise T-test function of the form pg.pairwise_ttests(dv='dv', within=['iv1', 'iv2'], subject='id', data=df), and I get the error: TypeError: 'int' object is not iterable. Are two within-subject factors not supported? Thank you in advance!

I'm facing the same problem. pg.mixed_anova() runs on the same dataframe and arguments so I'd guess the dataframe is fine. I was wondering if there was a solution to this, and if so, could you share it with me?

Raphael Vallat
@raphaelvallat
Hi @bilehtin, two within-factors are normally supported. Could you take a screenshot of the exact error? I'm guessing there's something wrong with the input dataframe, but hard to say without seeing the data. Thanks
bilehtin
@bilehtin
image.png
@raphaelvallat Hi and thanks for the response! I posted the error above - I hope it came through. I'm also happy to share the code and dataframe
Raphael Vallat
@raphaelvallat
Hi @bilehtin, can you please send me your data and code in direct message? Thanks
merjekrepo
@merjekrepo
print("Reading the excel file")
df = pd.read_excel(file_name)
print()
print("Applying pairwise t-test to the two-way repeated measures anova")
res = pg.pairwise_ttests(data=df, dv='Score', within=['Length', 'Width'], subject='UserID', padjust='sidak')
print(res)
print()
Hello @raphaelvallat. When I run two way repeated anovas code above, I get different results than SPSS.
two_way_repeated_measures_anova_pairwise_comparison.JPG
I meant interaction results are coming up different.
Here is the SPSS output.
two_way_repeated_measures_anova_pairwise_comparison_spss.JPG
And here is the data file.
You can find the data files either in long or wide formats
Raphael Vallat
@raphaelvallat
Hi @merjekrepo , can you please try without any p-values correction and send the screenshot again? The difference here is that SPSS calculates all the permutation of pairwise T-tests (ttest(a, b) is considered different than ttest(b, a)), while Pingouin only calculates the combinations (only ttest(a, b) is calculated). I do think that Pingouin's behavior is more adequate, especially when calculating corrected p-values because the p-value does not change when you do ttest(a, b) or test(b, a) so you end up with a lot of duplicate p-values in SPSS.
manfred hammerl
@manfredh_twitter
hello, i've done this out of interest in spss right now. the problem here is not "so you end up with a lot of duplicate p-values in SPSS" but that pingouin and spss end up with (very) different corrected p-values. (the uncorrected p-values in spss (when doing a lot of pairwise t-tests) are exaclty the same like in pingouin!). it seems that pingouin and spss have a different implementation of sidak correction.
Raphael Vallat
@raphaelvallat
This is likely because Pingouin and SPSS do not have the same strategy for selecting which pairwise comparisons to run, and since the correction is dependent on the number of tests, you end up with different corrected values
manfred hammerl
@manfredh_twitter
how many tests does pingouin assume? in the current example spss assumes 3 tests (for each level of length). i wasn't able to figure out how many tests pingouin assumes. to me it looks kind of random...
Raphael Vallat
@raphaelvallat
Could you provide a concrete example? Sorry but I'm not sure to understand what you mean by "SPSS assumes 3 tests". In Pingouin, the correction of p-values is applied separately to each of the level defined in "Contrast", i.e. for all the p-values when contrast == "Lenght", then all the p-values when contrast == "width" and finally all the p-values of the interaction.
merjekrepo
@merjekrepo

Hi @merjekrepo , can you please try without any p-values correction and send the screenshot again? The difference here is that SPSS calculates all the permutation of pairwise T-tests (ttest(a, b) is considered different than ttest(b, a)), while Pingouin only calculates the combinations (only ttest(a, b) is calculated). I do think that Pingouin's behavior is more adequate, especially when calculating corrected p-values because the p-value does not change when you do ttest(a, b) or test(b, a) so you end up with a lot of duplicate p-values in SPSS.

@raphaelvallat I think what you mean without p-values correction is the LSD (Least Significant Difference) in SPSS. I am sending the .htm file which includes those results along with other output.

merjekrepo
@merjekrepo
By the way, this difference is only for within-subjects (and only in interaction) anovas, i.e. one-way repeated measures anova, two-way repeated measures anova and mixed anova. Pengouin and SPSS give same results with one-way between-subjects anova and two-way between subjects anova.
karhohs
@karhohs
Hello, I'm looking for a non-parametric equivalent to the one-sample t-test. Googling around has led me to believe that such a test is the one-sample Wilcoxon signed rank test. However, I wasn't able to find this within pingouin. https://pingouin-stats.org/generated/pingouin.wilcoxon.html The Wilcoxon test exists, but from the documentation I couldn't figure out how to use this in a one-sample scenario. Will someone please help me understand how to use this function correctly for a one-sample test? Thanks!
Raphael Vallat
@raphaelvallat
hi @karhohs, you can use scipy.stats.wilcoxon where x is the difference between your values and the true mean. Pingouin's wilcoxon function does not implement the one-sample test (yet).
karhohs
@karhohs
Thank you!
Davison Moyo
@WeZhira_gono_twitter
hi I am new to pingouin, I am trying to install the library but the system returns an error --- ModuleNotFoundError: No module named 'pandas.tslib'. How can i fix that problem.
Jan Valošek
@valosekj
Hello everyone. How is possible to perform post-hoc tests for ANCOVA (https://pingouin-stats.org/generated/pingouin.ancova.html#pingouin.ancova). Tukey (https://pingouin-stats.org/generated/pingouin.pairwise_tukey.html#pingouin.pairwise_tukey) looks to be optimized for ANOVA. Is it possible to use it also for ANCOVA? Thanks!
Raphael Vallat
@raphaelvallat
@WeZhira_gono_twitter please make sure you're using Python 3.6+ and update the pandas package with pip install -U pandas
@valosekj post-hoc tests for ANCOVA (i.e. on adjusted means) are not yet implemented in Pingouin, sorry!
Jan Valošek
@valosekj
@raphaelvallat Thanks for response! Don't you know any other package where post-hoc tests for ANCOVA are implemented?
Raphael Vallat
@raphaelvallat
Not in Python, I would suggest using JASP (jasp-stats.org). Thanks
okliviaf
@okliviaf
Hello, I've run the Wilcoxon signed-rank test in Pingouin and I wondered which output reported the Z-score?
Raphael Vallat
@raphaelvallat
Hi @okliviaf , Pingouin is based on scipy implementation of the wilcoxon test (scipy.stats.wilcoxon), which unfortunately only returns the W and p-values. You can see here (https://github.com/scipy/scipy/blob/a2ae3e64c6858ba4899e5e21fa4c0a65c2661cfa/scipy/stats/morestats.py#L3032) that Scipy calculates the z-score internally but does not return it.
okliviaf
@okliviaf
Thanks so much for such a quick reply @raphaelvallat . Ps. Having converted to python data analysis all summer (having previously used SPSS), I am loving pingouin and will definitely be citing it if my papers get published!
Gerard Encina-Llamas
@GerardEncina

Hi. I am getting issues with the ICC function. It works nicely with the example dataset data = pg.read_dataset('icc')but when I use my data I get the error: AssertionError: Data must have at least 5 non-missing values. I cannot see though any difference in the input data type. What am I doing wrong?

Code example:

n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)

data_icc = pd.DataFrame({'test_run': ['test_a']*n_val + ['test_b']*n_val, 
     'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
    'rater': ['A']*2*n_val})

icc = pg.intraclass_corr(data=data_icc, targets='test_run', raters='rater', ratings='efr_value').round(3)

You can see an example of the data next:

corr.png
Raphael Vallat
@raphaelvallat
Hi @GerardEncina ! After a quick look at the generated data, I believe the issue is caused by the fact that you have only one unique value in the 'rater' column (= A).
Gerard Encina-Llamas
@GerardEncina
Hi @raphaelvallat I am performing a test-retest reliability analysis. I have then only two sets of measurements, one for day 1 (test, or val_a) and another one for day 2 (retest, or val_b). So, in reality, there are no raters, but just two sets of measurements. Is there any workaround in your function for raters k = 1? Or do you have any idea how to run your ICC for just 2 arrays?
Raphael Vallat
@raphaelvallat
Hi @GerardEncina, I think you could just label your "raters" in incrementing order from 1,2, 3, 4, etc. Assuming of course that the observations are paired and that let's say the third rating of test_a and test_b was made by the same "participant". Does that make sense? Now, since you only have two tests, I think you can also simply use a regular Pearson correlation (see also here: https://www.statisticshowto.com/test-retest-reliability/). Hope this helps!
Gerard Encina-Llamas
@GerardEncina
Hi @raphaelvallat Thanks a lot for your suggestion. What you proposed was not entirely correct but it pointed my towards the correct solution. Actually, what needs to be in incrementing order is "targets" and not "raters". What I understood now is that, in the ICC, what you are comparing is the agreement (or consistency) between different raters that are evaluating the same thing. Applied to a test-retest analysis, the raters are the different recording sessions, as you want to compare the agreement (or consistency) between different recording sessions.
See below (for everyone) how I modified my example code:
n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)

data_icc = pd.DataFrame({'targets': np.concatenate((np.arange(1, n_val+1, 1), np.arange(1, n_val+1, 1)), axis=0), 
     'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
    'test_session': ['test_a']*n_val + ['test_b']*n_val})

#Pearson
print('Pearson')
print(pg.corr(x=val_a[0], y=val_b[0], method='pearson'))

#ICC
print('\n\nICC')
print(pg.intraclass_corr(data=data_icc, targets='targets', raters='test_session', ratings='efr_value').round(3))
This message was deleted
Raphael Vallat
@raphaelvallat
That's great to hear, and thanks for sharing your code!