These are chat archives for dereneaton/ipyrad

2nd
Oct 2016
Shea Lambert
@SheaML
Oct 02 2016 00:12

Hi again @dereneaton and @isaacovercast, a couple of questions.

1) Is there an easy way to exclude the merged reads from PEAR instead of including them downstream? Also, are the PEAR results stored somewhere?

2) Let's say I have a quality filtering scheme (sliding window with Trimmomatic) that I'm happy with. Is there an easy way to skip ipyrad's quality filtering? Not that double-filtering should be problematic, but it would help to keep things simple and comparable for my purposes.

Deren Eaton
@dereneaton
Oct 02 2016 04:20

@SheaML

  1. Currently we don't have an option to exclude merged reads but it would be simple to add an option to do that. And they aren't actually merged by PEAR, but by vsearch. The accuracy is comparable.

  2. Here is what step2 does now, depending on the setting of filter_adapters to 0, 1, or 2.

if 0: it trims bases from the edges (edit_cutsites), it filters reads that have too many Ns (max_low_qual_bases), and it filters reads that are too short to cluster effectively (filter_min_trim_len)

if 1: Same as 0, plus it trims reads on the left and right at the first base with a Q score < 20, which can be modified by quality offset (phred_Qscore_offset).

if 2: Same as 1 plus it trims Illumina adapter, or adapter+barcode for R2, if they are present in the reads.

Thanks for asking, I need to update the docs for all of these latest changes.

Edgardo M. Ortiz
@edgardomortiz
Oct 02 2016 18:11
Hi @dereneaton @isaacovercast, I have a dataset in which I am losing half the loci due to sample duplication in the alignments from the clustering across, until the [excluded_loci] file is added to the outfiles is there any other way to check those loci?
Deren Eaton
@dereneaton
Oct 02 2016 19:08
@edgardomortiz , I'll look into it, the dups info is a not easily accessible at the moment, but it could be made easier to access. Half your loci seems like a lot of duplicates, are you using a really high clustering threshold (e.g., .99)? Or a different threshold at step3 versus 6? Lots of duplicate matches can happen if you are clustering at a high value but your reads contain lots of Ns, errors, or other variation so that they cluster poorly in step3.
Edgardo M. Ortiz
@edgardomortiz
Oct 02 2016 19:12
@dereneaton I am using 0.90 as clustering threshold on PE 2x140bp. I thought of that too, I am trying 0.85 as well.
Deren Eaton
@dereneaton
Oct 02 2016 19:16
I just changed the code to detect duplicates recently, I'll double check that it's working correctly.