raphaelvallat on develop
Fix typo in changelog (compare)
raphaelvallat on master
Fix typo in changelog (compare)
raphaelvallat on v0.3.11
raphaelvallat on master
Fix invalid computation of skip… Reverted changes in partial pai… Switch to 0.3.11 (compare)
raphaelvallat on develop
Switch to 0.3.11 (compare)
Hi. I am getting issues with the ICC function. It works nicely with the example dataset data = pg.read_dataset('icc')
but when I use my data I get the error: AssertionError: Data must have at least 5 non-missing values.
I cannot see though any difference in the input data type. What am I doing wrong?
Code example:
n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)
data_icc = pd.DataFrame({'test_run': ['test_a']*n_val + ['test_b']*n_val,
'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
'rater': ['A']*2*n_val})
icc = pg.intraclass_corr(data=data_icc, targets='test_run', raters='rater', ratings='efr_value').round(3)
You can see an example of the data next:
val_a
) and another one for day 2 (retest, or val_b
). So, in reality, there are no raters, but just two sets of measurements. Is there any workaround in your function for raters k = 1? Or do you have any idea how to run your ICC for just 2 arrays?
n_val = 100
val_a = np.random.rand(1, n_val)
val_b = val_a + 0.6*np.random.rand(1, n_val)
data_icc = pd.DataFrame({'targets': np.concatenate((np.arange(1, n_val+1, 1), np.arange(1, n_val+1, 1)), axis=0),
'efr_value': np.concatenate((val_a[0], val_b[0]), axis=0),
'test_session': ['test_a']*n_val + ['test_b']*n_val})
#Pearson
print('Pearson')
print(pg.corr(x=val_a[0], y=val_b[0], method='pearson'))
#ICC
print('\n\nICC')
print(pg.intraclass_corr(data=data_icc, targets='targets', raters='test_session', ratings='efr_value').round(3))
I'm not sure I understand pairwise_ttests(). See notebook:
https://github.com/FlorinAndrei/soda_pop_coke/blob/main/soda_pop_coke.ipynb
Function call is in cell #27. Dataframe (long form) is in cell #26 (and values are squished with sqrt()). The original wide form is in cell #24. In the wide form, each entry in the 'points' column is a unique pair of lat/long coordinates; the other three columns are distances from each point to three different centroids.
I want to show that, e.g. for the "soda" points, the shortest mean distance is to the "East/West Coasts" centroids.
Why do I get Paired=False from pairwise_ttests()? Does that mean it only compares population means as a group?
My understanding is that, with subject='points', it would somehow figure out that each lat/long pair is repeated 3 times in the 'points' column, and would use that information to do a more in-depth comparison, as opposed to plain group means comparison.
In cell #23 I did confidence intervals for means, each group separately, using compute_bootci(), and that's fine, but I want a more refined test than that.
Maybe I don't truly understand the theory behind pairwise t-tests.
If you clone the repo, it has all the files you need to run the notebook.
https://github.com/FlorinAndrei/soda_pop_coke
Thanks!
It's not clear from the docs how pg.ttest() works for a single sample test. It always requires a second variable y.
I just want to imitate the t.test() function from R, which can be used with a single sample very easily:
t.test(var)
How to get the exact same results from pg?
It's not clear to me how to do the one-sample Z test for a proportion. In R, that would be:
prop.test(successes, total, p=ratio)
statsmodels provides something similar:
proportions_ztest(successes, total, value=ratio)
But the output is poor compared to the R function - there's no confidence interval.
> prop.test(1200, 2500, p=0.44, correct = F)
1-sample proportions test without continuity correction
data: 1200 out of 2500, null probability 0.44
X-squared = 16.234, df = 1, p-value = 5.599e-05
alternative hypothesis: true p is not equal to 0.44
95 percent confidence interval:
0.4604617 0.4995996
sample estimates:
p
0.48
> exam = matrix(c(94, 113, 31, 62), nrow = 2)
> stats::prop.test(exam, correct = F)
2-sample test for equality of proportions without continuity correction
data: exam
X-squared = 3.8509, df = 1, p-value = 0.04972
alternative hypothesis: two.sided
95 percent confidence interval:
0.002588801 0.209982628
sample estimates:
prop 1 prop 2
0.7520000 0.6457143
pg.linear_regression()
seems to return predicted values only when as_dataframe=False
. What is the reason?
as_dataframe=True
(default), Pingouin only returns a Pandas dataframe with the coefficients, CI, p-values, etc. It is easier to return values of different format/size using a dictionary than a Pandas dataframe, and that's precisely why I've included the as_dataframe=False
option for users who may need more exhaustivity in the output
pg.mixed_anova(dv='value', within=['variable_a', 'variable_b'], between = 'group', subject='subject', data=df, effsize="ng2")
returns an dtype error. is this me being an idiot or is it a limitation? i couldnt find a direct mention of it anywhere so thought id check before deep diving
p-val
column)