These are chat archives for dereneaton/ipyrad

22nd
Feb 2017
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:13
I'm running v.0.6.4 and am getting some invalid VCF output: I get genotypes called which are not listed in the ALT field (e.g. a sample genotyped as 3/1 when there is only two alternative alleles listed). Any idea what is going on?
Deren Eaton
@dereneaton
Feb 22 2017 16:16
@s-arvidsson-lgc are you assembling with a reference or denovo?
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:16
denovo
locus_3925 142 . G A,T 13 PASS NS=4;DP=33 GT:DP:CATG 0/2:9:1,1,3,4 0/1:7:7,0,0,0 0/2:9:0,1,3,5 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 ./.:0:0,0,0,0 3/1:8:3,4,0,1
I also get malformed CATG fields (missing count): locus_17982 155 . A C 13 PASS NS=12;DP=12794 GT:DP:CATG 1/0:946:154,791,1,0 0/0:1091:141,950,0,0 0/0:1127:152,974,1,0 0/0:1114:138,975,1,0 1/0:1110:761,348,0,1 0/0:841:0,0,1,840 1/0:679:475,203,1,0 1/0:977:664,312,1,0 0/0:1367:0,0,3,1364 0/0:1055:0,0,0,1055 0/0:1264:140,1123,1, 0/0:1223:160,1060,3,
Deren Eaton
@dereneaton
Feb 22 2017 16:22
I see, something is wonky. I'll look into it ASAP.
Is this single or paired-end data?
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:23
paired end
Deren Eaton
@dereneaton
Feb 22 2017 16:27
Everything looks to be fine in a single-end VCF file I'm looking at, so I'm guessing it's a problem in the code having to do with second reads. I'll check it out.
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:28

Thanks for looking into this!

I also have another issue: I give the pipeline already demultiplexed data, e.g. named "sample1R1.fastq, sample1R2.fastq,sample2R1.fastq,sample2R2.fastq". These get correctly recognized as paired files - however, the samples come out as "sample1,sample2" with the underscore remaining, which is annoying.

Deren Eaton
@dereneaton
Feb 22 2017 16:29
Try typing with backticks around the elements to preserve the underscores.
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:29
Sorry about that
sample1_R1_.fastq, sample1_R2_.fastq,sample2_R1_.fastq,sample2_R2_.fastq get named sample1_,sample2_
Deren Eaton
@dereneaton
Feb 22 2017 16:31
which version of ipyrad are you using?
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:32
0.6.4
Deren Eaton
@dereneaton
Feb 22 2017 16:33
hmm, I'm not able to replicate the underscores being left on the names when I just ran it now.
I'll test some more things to see
Oh, I see, it happens if the files are not gzipped (names do not end in .gz). That will be easy to fix.
s-arvidsson-lgc
@s-arvidsson-lgc
Feb 22 2017 16:37
Great, thanks!