These are chat archives for dereneaton/ipyrad
Hi everyone. This is a general question about VCFs and RAD-seq (and perhaps belongs elsewhere), but since our group uses ipyrad, I thought I would ask here. I am looking at a VCF generated after an ipyrad run (ipyrad v.0.6.27), and I noticed that within some loci, the sample read depth (DP) values change from one snp to another. Here's an example (just showing the relevant parts of the VCF):
#CHROM POS REF ALT FORMAT sample1 sample2 locus_192 1 A G,T GT:DP:CATG 0/0:32:4,26,0,2 1/0:17:0,12,0,5 locus_192 2 G A GT:DP:CATG 0/0:32:0,3,0,29 0/0:17:0,2,2,13 locus_192 3 A G,C GT:DP:CATG 1/0:9:0,3,3,3 1/0:17:0,6,0,11 locus_192 4 A C,G GT:DP:CATG 0/0:31:0,31,0,0 2/2:17:0,2,0,15 locus_192 5 A C GT:DP:CATG 0/0:32:0,30,2,0 0/0:17:0,15,0,2 locus_192 9 G T GT:DP:CATG 0/0:29:2,0,3,24 0/0:2:0,0,0,2
So in this case, within locus 192, sample1 has 4 different DP values across the six SNPs: 32, 9, 31, 32, 29. I am trying to figure out why different number of reads make up certain SNPs? Could it be because of base quality - as in for SNP at position 3, only 9 reads (out of ... 32?) have reliable base calls for sample1? I don't profess expertise in the RAD-seq pipeline/process. Any explanations or references are much appreciated!