Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Apr 21 20:34
    raphaelvallat commented #166
  • Apr 21 20:16
    zeydabadi commented #166
  • Apr 21 18:28
    raphaelvallat commented #166
  • Apr 20 12:24
    zeydabadi commented #166
  • Apr 20 12:23
    zeydabadi commented #166
  • Apr 20 12:21
    zeydabadi commented #166
  • Apr 19 21:05
    zeydabadi commented #166
  • Apr 19 21:00
    raphaelvallat commented #166
  • Apr 19 20:54
    zeydabadi edited #166
  • Apr 19 20:54
    zeydabadi edited #166
  • Apr 19 20:53
    zeydabadi edited #166
  • Apr 19 20:53
    raphaelvallat labeled #166
  • Apr 19 20:51
    zeydabadi opened #166
  • Apr 15 09:30
    dominikkraft commented #162
  • Apr 14 21:19

    raphaelvallat on develop

    Fix typo in changelog (compare)

  • Apr 14 21:19

    raphaelvallat on master

    Fix typo in changelog (compare)

  • Apr 14 21:19

    raphaelvallat on v0.3.11

    (compare)

  • Apr 14 21:10

    raphaelvallat on master

    Fix invalid computation of skip… Reverted changes in partial pai… Switch to 0.3.11 (compare)

  • Apr 14 21:10

    raphaelvallat on develop

    Switch to 0.3.11 (compare)

  • Apr 14 21:08
    raphaelvallat closed #134
okliviaf
@okliviaf
Thanks so much for such a quick reply @raphaelvallat . Ps. Having converted to python data analysis all summer (having previously used SPSS), I am loving pingouin and will definitely be citing it if my papers get published!
Gerard Encina-Llamas
@GerardEncina

Hi. I am getting issues with the ICC function. It works nicely with the example dataset data = pg.read_dataset('icc')but when I use my data I get the error: AssertionError: Data must have at least 5 non-missing values. I cannot see though any difference in the input data type. What am I doing wrong?

Code example:

n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)

data_icc = pd.DataFrame({'test_run': ['test_a']*n_val + ['test_b']*n_val, 
     'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
    'rater': ['A']*2*n_val})

icc = pg.intraclass_corr(data=data_icc, targets='test_run', raters='rater', ratings='efr_value').round(3)

You can see an example of the data next:

corr.png
Raphael Vallat
@raphaelvallat
Hi @GerardEncina ! After a quick look at the generated data, I believe the issue is caused by the fact that you have only one unique value in the 'rater' column (= A).
Gerard Encina-Llamas
@GerardEncina
Hi @raphaelvallat I am performing a test-retest reliability analysis. I have then only two sets of measurements, one for day 1 (test, or val_a) and another one for day 2 (retest, or val_b). So, in reality, there are no raters, but just two sets of measurements. Is there any workaround in your function for raters k = 1? Or do you have any idea how to run your ICC for just 2 arrays?
Raphael Vallat
@raphaelvallat
Hi @GerardEncina, I think you could just label your "raters" in incrementing order from 1,2, 3, 4, etc. Assuming of course that the observations are paired and that let's say the third rating of test_a and test_b was made by the same "participant". Does that make sense? Now, since you only have two tests, I think you can also simply use a regular Pearson correlation (see also here: https://www.statisticshowto.com/test-retest-reliability/). Hope this helps!
Gerard Encina-Llamas
@GerardEncina
Hi @raphaelvallat Thanks a lot for your suggestion. What you proposed was not entirely correct but it pointed my towards the correct solution. Actually, what needs to be in incrementing order is "targets" and not "raters". What I understood now is that, in the ICC, what you are comparing is the agreement (or consistency) between different raters that are evaluating the same thing. Applied to a test-retest analysis, the raters are the different recording sessions, as you want to compare the agreement (or consistency) between different recording sessions.
See below (for everyone) how I modified my example code:
n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)

data_icc = pd.DataFrame({'targets': np.concatenate((np.arange(1, n_val+1, 1), np.arange(1, n_val+1, 1)), axis=0), 
     'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
    'test_session': ['test_a']*n_val + ['test_b']*n_val})

#Pearson
print('Pearson')
print(pg.corr(x=val_a[0], y=val_b[0], method='pearson'))

#ICC
print('\n\nICC')
print(pg.intraclass_corr(data=data_icc, targets='targets', raters='test_session', ratings='efr_value').round(3))
This message was deleted
Raphael Vallat
@raphaelvallat
That's great to hear, and thanks for sharing your code!
Aria4201
@Aria4201
Hi, I am trying to install pingouin using "pip install pingouin" but there is an error. I tried to install it in PyCharm as well but it fails and says "Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\ariad\PycharmProjects\pythonProject\venv\Scripts\python.exe'." could you please help?
Florin Andrei
@FlorinAndrei

I'm not sure I understand pairwise_ttests(). See notebook:

https://github.com/FlorinAndrei/soda_pop_coke/blob/main/soda_pop_coke.ipynb

Function call is in cell #27. Dataframe (long form) is in cell #26 (and values are squished with sqrt()). The original wide form is in cell #24. In the wide form, each entry in the 'points' column is a unique pair of lat/long coordinates; the other three columns are distances from each point to three different centroids.

I want to show that, e.g. for the "soda" points, the shortest mean distance is to the "East/West Coasts" centroids.

Why do I get Paired=False from pairwise_ttests()? Does that mean it only compares population means as a group?

My understanding is that, with subject='points', it would somehow figure out that each lat/long pair is repeated 3 times in the 'points' column, and would use that information to do a more in-depth comparison, as opposed to plain group means comparison.

In cell #23 I did confidence intervals for means, each group separately, using compute_bootci(), and that's fine, but I want a more refined test than that.

Maybe I don't truly understand the theory behind pairwise t-tests.

If you clone the repo, it has all the files you need to run the notebook.

https://github.com/FlorinAndrei/soda_pop_coke

Thanks!

Raphael Vallat
@raphaelvallat
Hi @FlorinAndrei , you want to use within='variable' instead of between='variable' to get a paired T-test instead of an independent T-test
Florin Andrei
@FlorinAndrei
That works, thanks Raphael!
Florin Andrei
@FlorinAndrei

It's not clear from the docs how pg.ttest() works for a single sample test. It always requires a second variable y.
I just want to imitate the t.test() function from R, which can be used with a single sample very easily:

t.test(var)

How to get the exact same results from pg?

manfred hammerl
@manfredh_twitter
second var y is the testvalue for the one sample t-test
Raphael Vallat
@raphaelvallat
@FlorinAndrei pg.ttest(var, y=0) should replicate the R results
Florin Andrei
@FlorinAndrei
Again pg.ttest() but perhaps this is more general: the confidence level seems set at 0.95. Is there a way to change it?
Raphael Vallat
@raphaelvallat
@FlorinAndrei there is no way to change the confidence level of the confidence intervals in the pingouin ttest function
Florin Andrei
@FlorinAndrei
Ok, I can live with that. But, just curious, any plans to allow a custom CL in future versions?
Raphael Vallat
@raphaelvallat
Not in a near future (I'm super busy these days) but that should be quite straightforward to implement, so feel free to open an issue on GitHub so we can keep track!
Jan
@jan0508_gitlab
Hi, I am new to statistics and currently wondering why pg.pairwise_ttests for two within factors with p value correction results in NaN for two p-corr and p-adjust values. Is there maybe a simple reason for this?
Raphael Vallat
@raphaelvallat
Hi @jan0508_gitlab, you will get a NaN if there are only 2 levels in your within-factor, and therefore only one test computed for that specific factor. Indeed, the correction for multiple comparison is performed separately for each factor.
1 reply
Florin Andrei
@FlorinAndrei
Is there a pg equivalent to the R function prop.test()?
Raphael Vallat
@raphaelvallat
hhefter
@hhefter
Hi, I installed pingouin, but I am getting an error: No module named 'pingouin'
Raphael Vallat
@raphaelvallat
Hi, 1) make sure to restart your Jupyter kernel if you're working with Jupyter notebooks, 2) make sure that you have installed Pingouin in the correct environment if you're using conda and have several environements
hhefter
@hhefter
I am using spyder, the problem was that I installed Pingouin on miniconda insted on the Spyder. now its working.
Florin Andrei
@FlorinAndrei

It's not clear to me how to do the one-sample Z test for a proportion. In R, that would be:

prop.test(successes, total, p=ratio)

statsmodels provides something similar:

proportions_ztest(successes, total, value=ratio)

But the output is poor compared to the R function - there's no confidence interval.

(to be clear, the variables in the examples above are scalar, not vectors or matrices)
> prop.test(1200, 2500, p=0.44, correct = F)

    1-sample proportions test without continuity correction

data:  1200 out of 2500, null probability 0.44
X-squared = 16.234, df = 1, p-value = 5.599e-05
alternative hypothesis: true p is not equal to 0.44
95 percent confidence interval:
 0.4604617 0.4995996
sample estimates:
   p 
0.48
Florin Andrei
@FlorinAndrei
Same question for the 2-sample z-test for proportions - how to get the estimated difference (as a confidence interval) in population proportions. In R this is:
> exam = matrix(c(94, 113, 31, 62), nrow = 2)
> stats::prop.test(exam, correct = F)

    2-sample test for equality of proportions without continuity correction

data:  exam
X-squared = 3.8509, df = 1, p-value = 0.04972
alternative hypothesis: two.sided
95 percent confidence interval:
 0.002588801 0.209982628
sample estimates:
   prop 1    prop 2 
0.7520000 0.6457143
Raphael Vallat
@raphaelvallat
Hi @FlorinAndrei, this is not implemented in Pingouin. Given that statsmodels already has the proportions_ztest function, I would suggest opening an issue or PR directly on the GitHub of statsmodels. Eventually, we could consider re-implementing the prop.test function in Pingouin, but I don't think this is the highest priority.
Florin Andrei
@FlorinAndrei
I've opened an issue regarding the confidence interval for proportions on the statsmodels github, thanks! statsmodels/statsmodels#7275
Another question: pg.linear_regression() seems to return predicted values only when as_dataframe=False. What is the reason?
Raphael Vallat
@raphaelvallat
When as_dataframe=True (default), Pingouin only returns a Pandas dataframe with the coefficients, CI, p-values, etc. It is easier to return values of different format/size using a dictionary than a Pandas dataframe, and that's precisely why I've included the as_dataframe=False option for users who may need more exhaustivity in the output
Note that it is actually possible to return the predicted values as a hidden attribute of the dataframe, but I think it's less confusing to say to Pingouin's users that if they want a fuller output they should just use as_dataframe=False to get a dictionary instead of a "summary" dataframe
Florin Andrei
@FlorinAndrei
with lm = pg.linear_regression() is there a simple predict() method like with the scikit models?
Raphael Vallat
@raphaelvallat
No because the linear regression is implemented as a function and not as Python class (like scikit learn)
Norbert Wilkens
@BikeNW
Hi Raphael, I just wanted to add my question to the current topic. I want to compute interaction and moderation with pingouin.linear_regression. This seems currently still not be possible. Is it intended to get implemtented? Thanks, Norbert
Raphael Vallat
@raphaelvallat
Hi @BikeNW, while this is not directly possible, I believe that you can create additional columns in your dataframe that would represent the contrast that you want to try (e.g. interaction between two terms). Otherwise, I'd recommend checking statsmodels if you'd like to have R-style formulas. Thanks!
MacAskill Lab
@macaskill-lab
hello! first - thank for the fantastic package! i have come across an issue and it may be because of my lack of skills, but cant work it out - it seems that for a mixed anova design you can't have multiple within subject comparisons. e.g. : pg.mixed_anova(dv='value', within=['variable_a', 'variable_b'], between = 'group', subject='subject', data=df, effsize="ng2") returns an dtype error. is this me being an idiot or is it a limitation? i couldnt find a direct mention of it anywhere so thought id check before deep diving
Raphael Vallat
@raphaelvallat
Hi @macaskill-lab! You're not an idiot: the mixed ANOVA only allows for one between + one within, or two within, or two between factors. It doesn't work with two within + one between, or two between + one within! For more complex ANOVA designs, I'd highly recommend using the free JASP software!
AtK42
@AtK42
hi all. I'm having some problems getting started with pingouin. I've installed the module via the cmd line and also via conda but when I try to import it in my jupyter notebook it says it cannot find the module, even though it has been successfully installed. any ideas as to why that's the case?
Raphael Vallat
@raphaelvallat
Hi @AtK42, are you sure that you have installed Pingouin in the same Python environment as your Jupyter Notebook? (the default is "base" in Anaconda)
AtK42
@AtK42
Hi @raphaelvallat , thanks for your reply. I did check and it was the case. In fact, it works now after I restarted my laptop (even though it still didn't work after restarting my jupyter kernel as you have suggested in your message from Jan. 19). So I have no idea how or why but it's all good now :)
Peach Cobbler 🍑
@johnmustin1_twitter
hi! i was hoping to get help with pingouin, sorry if this is a novice question but i can't seem to find it anywhere online
dose anyone know how to return a p-value? not the t-test function, i specifically want the p-value
Raphael Vallat
@raphaelvallat
Hi @johnmustin1_twitter, you want the p-value of a T-test? You can get it with this function: https://pingouin-stats.org/generated/pingouin.ttest.html#pingouin.ttest (the p-val column)