Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 22 18:22
    raphaelvallat commented #81
  • Jan 22 18:21

    raphaelvallat on develop

    Added new function plot_circmean (compare)

  • Jan 22 07:17

    raphaelvallat on develop

    Minor improvements in code test… (compare)

  • Jan 22 07:10
    raphaelvallat assigned #81
  • Jan 22 07:10
    raphaelvallat labeled #81
  • Jan 22 07:10
    raphaelvallat opened #81
  • Jan 18 22:31
    raphaelvallat labeled #80
  • Jan 18 22:30
    raphaelvallat commented #80
  • Jan 18 22:28
    raphaelvallat closed #78
  • Jan 18 22:28

    raphaelvallat on v0.3.2

    (compare)

  • Jan 18 22:17
    raphaelvallat closed #79
  • Jan 18 22:03

    raphaelvallat on master

    Added outdated dependency to ch… Added hyperlinks to dependencies Fixed minor bug in plot_rm_corr… and 5 more (compare)

  • Jan 18 22:03

    raphaelvallat on develop

    Switch to 0.3.2 (compare)

  • Jan 18 21:23

    raphaelvallat on develop

    MAJOR: added marginal option fo… (compare)

  • Jan 16 20:00
    raphaelvallat commented #78
  • Jan 15 22:20
    raphaelvallat edited #78
  • Jan 15 22:19
    raphaelvallat commented #78
  • Jan 15 22:06
    raphaelvallat commented #78
  • Jan 15 22:05
    raphaelvallat commented #78
  • Jan 15 00:16
    raphaelvallat commented #78
Raphael Vallat
@raphaelvallat
Hi @jdweaver ! Thanks for the feedback I appreciate it! So, you should not use pandas.dropna() in such a context. The issue here is that you have missing values in your repeated measurements (= within variable), and therefore Pingouin will internally remove all the subjects with missing values using this function. This is a quite conservative method, but it's sadly the most appropriate when dealing with missing values in repeated measures ANOVA. I'd also suggest that you take a look at the last part of this jupyter notebook, which explains the different missing values removal strategies. Hope this helps! Thanks!
Joshua
@jdweaver
@raphaelvallat thanks for the quick response and for the link to the notebook. This makes more sense now. Thanks!
Joshua
@jdweaver

Hi All, curious if anyone else has run into this. I'm trying to assess an interaction for a mixed model with 2 between factors and 1 within factor. Based on the pairwise_ttests documentation I should be able to do something like this:

from pingouin import pairwise_ttests, read_dataset';
pin.pairwise_ttests(dv='Scores', within='Time', between=['Group', 'Time'], data=df);

This code with the sample data executes fine, but files for my data, resulting in an IndexError: list index out of range.

Anyone encountered this before?

Full traceback below, version of pingouin I am running: 0.2.9
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-63-081c62e8ce0a> in <module>
      4                     , data=mix_melt_usage_cat_t6m_feb
      5                    ,padjust='holm', return_desc=True
----> 6                    ,nan_policy='listwise'
      7                   )

C:\Anaconda\lib\site-packages\pingouin\pairwise.py in pairwise_ttests(data, dv, between, within, subject, parametric, alpha, tail, padjust, effsize, nan_policy, return_desc, interaction, export_filename)
    360             stats = stats.append(pairwise_ttests(dv=dv,
    361                                                  between=fbt[i],
--> 362                                                  within=fwt[i],
    363                                                  subject=subject,
    364                                                  data=data,

IndexError: list index out of range
Raphael Vallat
@raphaelvallat
Hi @jdweaver ! Two things: first you need to pass subject as well whenever testing for a within-subject effect, 2) if your 'Time' column is a within-subject factor, do not put it in the between argument. The following should work: pin.pairwise_ttests(dv='Scores', within='Time', between='Group', subject='YOURSUBJECTVARIABLE', data=df)
Joshua
@jdweaver
@raphaelvallat thanks for clarifying. I've updated the syntax based on your suggestion. I'm not seeing the "within" factor show up in the contrasts. Based on the notes in the documentation it should show up as: within + between + within between Is showing only the between factors when a within effect is specified, expected behavior? *
Sample code below:
pin.pairwise_ttests(dv='outcome'
                    ,within='variable'
                    ,between=['between1', 'between2']
                    ,subject='subjectvariable'
                    , data=df
                  )
Raphael Vallat
@raphaelvallat
@jdweaver You can only specify ONE within and ONE between variable:
pin.pairwise_ttests(dv='outcome'
                    ,within='variable'
                    ,between='between1'
                    ,subject='subjectvariable'
                    , data=df
                  )
manfred hammerl
@manfredh_twitter
grafik.png
this is a figure obtained with the new shift plot version (0.3.0). it's different from the figure on the website (https://pingouin-stats.org/generated/pingouin.plot_shift.html#pingouin.plot_shift). e.g. if Y values are smaller than X values, quantiles are positive (should be negative), because it's Y minus X.
Raphael Vallat
@raphaelvallat
Hi @manfredh_twitter ! Thanks for the feedback. Could you share the code & data? The quantiles are actually X minus Y, and not Y minus X. Thanks!
manfred hammerl
@manfredh_twitter
ok, because in the y-label it says "Y - X quantiles"
so the figure on your website is not correct?
import numpy as np
import pingouin as pg
np.random.seed(42)
x = np.random.normal(5.5, 2, 50)
y = np.random.normal(6, 1.5, 50)
fig = pg.plot_shift(x, y)
grafik.png
so this is the example from your website. as you can see, the plot is different from yours :-)
manfred hammerl
@manfredh_twitter
in case you need the information: i'm using python 3.7.5, spyder 3.3.6, pingouin 0.3.0, seaborn 0.9.0, numpy 1.17.3
and i tried the second example from your website too:
import numpy as np
import pingouin as pg
np.random.seed(42)
x = np.random.normal(5.5, 2, 30)
y = np.random.normal(6, 1.5, 30)
fig = pg.plot_shift(x, y, paired=True, n_boot=2000,
percentiles=[25, 50, 75],
show_median=False, seed=456, violin=False)
grafik.png
different from your figure
Raphael Vallat
@raphaelvallat
Thanks for your sharp eyes @manfredh_twitter ! So indeed I think there are two separate issues here:
1) The y-label is not correct, it should be X - Y
2) For some reasons, the figure on Pingouin's website were not updated when I released the latest version.
I'll try to fix that soon!
manfred hammerl
@manfredh_twitter
i understand. but it was Y - X up to version 0.2.9. so X - Y is new in version 0.3.0?
so it's only another depiction. or is there some deeper meaning?
Raphael Vallat
@raphaelvallat
X - Y is indeed new. I was just trying to get as close as possible, for testing purposes, as the Matlab version of this function by Guillaume Rousselet. The quantiles are also estimated using a robust method (Harrel-Davis) in v0.3.0.
manfred hammerl
@manfredh_twitter
yes, i recognised the estimation of quantiles (in the old version the real quantiles where shown in the figure). this is something i have to get used to...
Linden Parkes
@LindenParkes_twitter
Hi pingouin users! I'm working with the mediation_analysis function and am wondering if there is any way to incorporate repeated measures data? Perhaps via mixed effects?
Currently, I have X_t1, M_t1, and Y_t2 as my variables (where _t1 and _t2 represents the variable assessed at timepoint 1 or timepoint 2, respectively). If I only use these three variables, then the model is fine because no two variables have dependency due to repeated measurement (i.e., I'm only using Y_t2, not Y_t1 and Y_t2 in model; similarly, I only use X_t1, not X_t1 and X_t2). But I would like to include Y_t1 as a covariate, which introduces correlated data into the model.
Raphael Vallat
@raphaelvallat
Hi @LindenParkes_twitter! Unfortunately, this is not possible right now in Pingouin, and I doubt that it will be in the near future. From my understanding, there does not seem to be a gold standard to do mediation analyses with repeated measures, even though some methods have been developed in recent years (e.g. http://afhayes.com/public/aps2015mh.pdf). Alternatively, you can perhaps regress your covariate Y_t2 to all the other variables separately, get the residuals, and then apply a mediation on the residuals...?
Linden Parkes
@LindenParkes_twitter
Hi @raphaelvallat . no problem, thanks for letting me know. Yes, my plan B was just residualize Y_t2 with respect to Y_t1 prior to running the model. I presume you meant to say Y_t1 instead of Y_t2 above?
Raphael Vallat
@raphaelvallat
Indeed!
Raphael Vallat
@raphaelvallat
Hi @manfredh_twitter ! I just pushed a commit on the develop branch to fix the plot_shift function, which now returns the Y - X quantiles (in other words, I did not change the label to X - Y, but instead I changed the code to calculate the Y - X quantile, as it was before v0.3.0, but still keeping the robust estimator implemented in v0.3.0). This will be released in the next stable version of Pingouin, but if by any chance you can try the new function on your own data with the develop version of Pingouin (cloned from GitHub, then switch to the develop branch then python setup.py develop), that would be fantastic! Thank you! Commit: raphaelvallat/pingouin@774a5c4
manfred hammerl
@manfredh_twitter
grafik.png
it works! btw: thats my own implementation of shift plot (for personal use), so that i can add labels and change colors...
np.random.seed(42)
x = np.random.normal(5.5, 2, 50)
y = np.random.normal(6, 1.5, 50)
fig = sp.shift_plot(x, y, n_boot=5000, percentiles=[25, 50, 75],
ci=0.95, show_median=True, show_mean=False, violin=True,
xlab="Item X", ylab="Item Y", score="Preferenceranking",
size=2)
Raphael Vallat
@raphaelvallat
Amazing, thanks @manfredh_twitter ! Could you share your custom script with me? Or even, if you think these are valuable changes, please feel free to make a PR to Pingouin!
manfred hammerl
@manfredh_twitter
i'll email you the script
manfred hammerl
@manfredh_twitter
done
garincle
@garincle
Hi, using the last version of pingouin I have a problem with plot_rm_corr. if I use y='23b' i get 23 b
^
SyntaxError: invalid syntax
23 b
   ^
SyntaxError: invalid syntax
this name worked perfectly in rm_corr
I get also this error for other strings. I don't understand why some are ok and other not?
tks for your help!
garincle
@garincle

region = '23b'

pg.plot_rm_corr(data=result, x='age', y=region, subject='ID', legend=True,
kwargs_facetgrid=dict(height=4.5, aspect=1.5,
palette='Spectral'))

garincle
@garincle
Screenshot at 11-01-42.png
Raphael Vallat
@raphaelvallat
Hi @garincle, this is surprising. First, the error is not caused by Pingouin but by an internal call to statsmodels and patsy. I'd therefore recommend you upgrade these two libraries. Second, a "b" before a string means that the string is encoded as bytes. However, in this case the "b" is at the end of the string so I really don't see why you would get an error here. Could you send a list of strings that work and strings that do not work? Anyway, for now I think the quickest fix is to rename your variable (e.g. y="Region"). Thanks
garincle
@garincle
godd idea, thank you

Don't works

8Bm
23c
14r
11l
24a_prime

works

STGr
CA1
Clear
MGad
MGmc
MGz
MGpd
OT
Iapl
Ri

it seems that starting with a number cause the problem
all packages are already up to date
Raphael Vallat
@raphaelvallat
Ok I'll add a warning in future releases of Pingouin then! I'm guessing that Patsy or Statsmodels are trying to convert the string to a number when this latter starts with a number, hence the error. Thanks for pointing this out
garincle
@garincle
np, thanks for answering!