These are chat archives for dereneaton/ipyrad

Nov 2017
Ninh Vu
Nov 28 2017 01:49
I'm performing a gbs analysis with a reference of 69 loci (70 to 150 bp each) on 1,700 samples each with small amount of data ~23K reads/sample. I'm stuck at dereplicating step for the last twenty four hours. I've ran it on smaller sample size, and it took overnight to complete this step. Is there a way to improve the run time? I have 56 cores and 128GB of RAM to work with. The total size of all samples is ~11GB - so not terribly big.
Jean-RĂ©mi Trotta
Nov 28 2017 09:02
Hi! I hope you don't mind if I ask again. I would need a clarification about the final stats file in the outfiles folder. I run ipyrad as denovo-reference (chloroplast ref.) and when I look at those metrics I see that for example, for a sample, I have 805,763 raw reads, 801,554 passing filters, 4,841 mapping to the reference ("refseq_mapped_reads") but only 553,120 unmapped. So my question is: why mapped reads + unmapped reads is not equal to passing filters reads? From what I understand, using this assembly method, I would expect to discard the 4,841 reads mapping to the choloroplast sequence and used the 796,713 (801554-4841) remaining reads for the denovo assembly.
Ollie White
Nov 28 2017 10:31
That's just what I needed cheers @dereneaton