These are chat archives for dereneaton/ipyrad

Jul 2017
Nitish Narula
Jul 16 2017 23:04

Hi everyone. This is a general question about VCFs and RAD-seq (and perhaps belongs elsewhere), but since our group uses ipyrad, I thought I would ask here. I am looking at a VCF generated after an ipyrad run (ipyrad v.0.6.27), and I noticed that within some loci, the sample read depth (DP) values change from one snp to another. Here's an example (just showing the relevant parts of the VCF):

#CHROM    POS REF ALT FORMAT     sample1         sample2
locus_192 1   A   G,T GT:DP:CATG 0/0:32:4,26,0,2 1/0:17:0,12,0,5
locus_192 2   G   A   GT:DP:CATG 0/0:32:0,3,0,29 0/0:17:0,2,2,13
locus_192 3   A   G,C GT:DP:CATG 1/0:9:0,3,3,3   1/0:17:0,6,0,11
locus_192 4   A   C,G GT:DP:CATG 0/0:31:0,31,0,0 2/2:17:0,2,0,15
locus_192 5   A   C   GT:DP:CATG 0/0:32:0,30,2,0 0/0:17:0,15,0,2
locus_192 9   G   T   GT:DP:CATG 0/0:29:2,0,3,24 0/0:2:0,0,0,2

So in this case, within locus 192, sample1 has 4 different DP values across the six SNPs: 32, 9, 31, 32, 29. I am trying to figure out why different number of reads make up certain SNPs? Could it be because of base quality - as in for SNP at position 3, only 9 reads (out of ... 32?) have reliable base calls for sample1? I don't profess expertise in the RAD-seq pipeline/process. Any explanations or references are much appreciated!