These are chat archives for dereneaton/ipyrad

11th
Jan 2018
Gemma Clucas
@DrGemClucas_twitter
Jan 11 2018 15:41
Hi @dereneaton @isaacovercast , quick question I hope: is the clustering threshold ignored when using reference mapping? I.e. if reads are aligned to the same genomic location then will they be identified as homologs regardless of similarity? And are only uniquely mapped reads used? Thanks for the great support for the program!
Isaac Overcast
@isaacovercast
Jan 11 2018 16:51
@DrGemClucas_twitter Clustering threshold is ignored for the reference mapped reads. If you use the 'denovo+reference' or 'denovo-reference' then obv the clustering threshold will be used for the denovo fraction of the reads. Any reads that align to the same genomic position will be considered alleles from the same locus. Only uniquely mapped reads are used, that's correct. I think with secondary alignments we just throw them out, but I haven't looked at that part of the code in a while so don't hold me to it.
@markusruhsam You ran out of RAM: https://docs.python.org/3/library/exceptions.html#exceptions.MemoryError. How much RAM have you allocated for this step? Step 6 can be very ram intensive. The amount of memory required depends on the size of the dataset but generally I recommend "as much as you can allocate".
Deren Eaton
@dereneaton
Jan 11 2018 17:03
@markusruhsam @isaacovercast yeah, hidden in that output it says MemoryError, which is the real culprit. Definitely try to allocate more RAM, but if you can't get access to more then using fewer cores will reduce the amount of RAM used in the "indexing" step as well. We've tried to design that step to use at most ~4GB/core, but it is hard to get exactly right. Using fewer cores will not actually slow down the "clustering" step, which is your longest step anyways, since that step is limited to running on one node (10 cores in your case).