These are chat archives for dereneaton/ipyrad

24th
Jul 2017
Wind-ant
@Wind-ant
Jul 24 2017 01:19
Hello guys! I have a question about the vcf file output by ipyrad,why all the quality value is the same?mine is 13,which is too low,how to improve it?Thank you.
Isaac Overcast
@isaacovercast
Jul 24 2017 15:40
@all GLIBC_2.23' not found issue resolved
Deren Eaton
@dereneaton
Jul 24 2017 16:14
Hi @Wind-ant, the VCF file lists the minimum quality score as 13 because that this the default minimum that ipyrad uses to make basecalls, meaning that there is >95% confidence in the base call based on a binomial probability. Most base calls are actually made with a much higher confidence than this (>99.99%) but we simply don't save the quality scores for the VCF file output. This is for several reasons: we don't use base quality scores in calculating the consensus base calls, but we instead store the actual base counts (this is shown as CATG in the VCF file. It would be possible to re-calculate the base scores from those, but we've never found it necessary. What analysis tool are you using that says 13 is too low of a quality value?
Wind-ant
@Wind-ant
Jul 24 2017 16:24
what I know is Q=-10logP,if Q =10,then the rate of err is 0.1,generally,I use bcftools to call snp,the Q of each loci are almost different and then I filter those loci which has a Q<40,it means we want the rate of err is less than 0.0001.You mean you just don't show the quality scores just because it is not that useful as you think,so some loci's Q is actually not 13 but higher?
Deren Eaton
@dereneaton
Jul 24 2017 16:32
Yes, for example:
ocus_0 42      .       C       T       13      PASS    NS=12;DP=231    GT:DP:CATG      0/0:19:19,0,0,0 0/0:17:17,0,0,0 0/0:21:21,0,0,0 0/0:20:20,0
locus_0 56      .       A       T       13      PASS    NS=12;DP=231    GT:DP:CATG      0/0:19:0,19,0,0 0/0:17:0,17,0,0 0/0:21:0,21,0,0 1/1:20:0,0,
locus_0 79      .       C       A       13      PASS    NS=12;DP=231    GT:DP:CATG      0/0:19:19,0,0,0 0/0:17:17,0,0,0 0/0:21:21,0,0,0 0/0:20:20,0
locus_1 1       .       A       C       13      PASS    NS=12;DP=254    GT:DP:CATG      0/0:22:0,22,0,0 0/0:24:0,24,0,0 0/0:23:0,23,0,0 0/0:23:0,23
locus_1 25      .       C       G       13      PASS    NS=12;DP=254    GT:DP:CATG      0/0:22:22,0,0,0 0/0:24:24,0,0,0 0/0:23:23,0,0,0 0/0:23:23,0
locus_1 57      .       A       C       13      PASS    NS=12;DP=254    GT:DP:CATG      0/0:22:0,22,0,0 0/0:24:0,24,0,0 0/0:23:0,23,0,0 0/0:23:0,23
In the VCF above (from empirical data), the the quality says 13 for the first SNP, however, each sample had only one base observed, a C or a T, and each had between 17-20 observed bases. So if we calculated the quality score for 19Cs and zero of anything else, given the estimated error rate the rate of error would be very very very very small.
The consensus base calls are made for each sample individually, and not on a population-basis (i.e., based on the full 231 observed bases at this site across all samples), which is why a single quality score for the SNP is not really informative for the output.
Wind-ant
@Wind-ant
Jul 24 2017 16:42
I got you,thanks a lot,but one more question please,you think a single quality score is not really informative,then which parameters are the key reflecting the quality of the SNP or the whole VCF file as you think?like sample coverage per locus?
Wind-ant
@Wind-ant
Jul 24 2017 16:49
@isaacovercast just update the ipyrad via conda to its newest version then the problem of GLIBC_2.23 is resolved?
Isaac Overcast
@isaacovercast
Jul 24 2017 17:05
@Wind-ant Yes, this issue is now resolved. You have to actually update pysam to get the fix.
@all GLIBC_x.xx problem is fixed This was actually a problem with our pysam install so to resolve this you have to update both ipyrad and pysam:
conda update -c ipyrad ipyrad pysam