These are chat archives for dereneaton/ipyrad

28th
Aug 2017
Deren Eaton
@dereneaton
Aug 28 2017 14:13
@toczydlowski I'm not sure exactly. Did you look at the .loci file? There you can see the full data that SNPs are being extracted from. Maybe some sites are being called with Ns even though you set max_Ns to 0? If so, we'll need to fix that.
toczydlowski
@toczydlowski
Aug 28 2017 16:35
@dereneaton Here's a pic of one locus in .loci. The tails are where there are ApeK1 cutsites - so the read got cut in some individuals but not in others - aka it wasn't a 100% digest. Are these weird SNPs something to do with these tails? Another weird thing too - this locus is 103bp long, but all of the reads I fed into ipyrad were exactly 89 bp long. What's the explanation?
Screen Shot 2017-08-28 at 11.07.54 AM.png
toczydlowski
@toczydlowski
Aug 28 2017 16:40
@dereneaton Here's another better illustration from another locus.
Screen Shot 2017-08-28 at 11.34.03 AM.png
Deren Eaton
@dereneaton
Aug 28 2017 17:16
Hi @toczydlowski, in this case the fragments were short enough (i.e., the space between ApeK1 sites was small) that they were sequenced from both ends and the resulting reads overlapped. Depending on the datatype we allow the reads to form contigs if they overlap by some minimum proportion. For 'gbs' we use 50%. So in this case they form contigs which would explain why some SNPs occur in regions where no data was present for some samples, likely because they did not have any or enough reads in that orientation.
In some data sets this is super common, in others it doesn't happen at all. It depends on the fidelity of your size selection during library prep. If we did not do the reverse complement matching of reads then you would essentially have this locus represented twice in your data set.
toczydlowski
@toczydlowski
Aug 28 2017 18:59
@dereneaton Great! I came to the same conclusion myself in the meantime. That makes a lot of sense to me and happy to hear my rationalization was correct. Thanks for clear explanation! As for the weird low representation SNPs, we're looking at things in Genius now and are close to figuring out the pattern with those SNPs that seem they shouldn't have passed filtering. Will report back soon. Seems filtering/code may be doing weird things we don't want it to that should be fixed as you hinted at.