These are chat archives for dereneaton/ipyrad
Hello, I was just wondering how the p values for D-statistics are calculated. I have calculated the p values myself and mine seem to be less conservative for tests with borderline significance levels. On that note, am I correct in assuming that the significance is denoted by the orange or turquoise/green coloration of the D-statistic distribution? See attached screen grab of the plot I mean.
I have taken the output tables produced in the jupyter notebook such as this:
n dstat bootmean bootstd Z ABBA BABA nloci
0 0.054 0.057 0.057 0.941 149.283 134.056 2690
1 0.147 0.146 0.053 2.783 151.011 112.235 2834
I have tried to calculate the p values following the Pedicularis paper, converting the Z score to a two tailed p value, correcting using Holm–Bonferroni and a 0.01 cut off. I did this in R (sorry I'm still learning Python...) using the following code if its helpful
d <- read.table(file = "D-stats-results.txt", header = TRUE, sep = "\t")
p <- 2*(pnorm(-abs(d$Z)))
cor.p <- p.adjust(p, method = "holm")
d <- data.frame(d, cor.p)
write.table(d, file = "D-stats-results-with-p-value.txt", sep = "\t", col.names = TRUE, row.names = FALSE)
There's probably a easier way of doing this, I was just wondering why I am getting less conservative results.
Hope this makes sense