These are chat archives for dereneaton/ipyrad

26th
Sep 2016
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 09:09
ipyrad.tiff
Hi again, is this normal after 14 hours?
Isaac Overcast
@isaacovercast
Sep 26 2016 12:15
14 hours is quite a long time for clustering, but it really depends on the datatype. I have seen users with 300bp PE ddrad where this kind of time is normal. If you have really long reads clustering takes a long time.
Deren Eaton
@dereneaton
Sep 26 2016 14:45
I would say the biggest factor determining clustering times is often the number of unique clusters. Ideally most users doing phylogenetics or phylogeography are looking to get maybe 30-60K loci at high coverage (maybe mean depth=10), but if you used a very common set of cutters you may get something more like 1e6 clusters at mean depth=1. The latter would take much longer to cluster, but may be desirable for something like an association mapping study. Sometimes filtering can be important if there are many errors in your reads, and more stringent filters in step2 can lead to much faster clustering in step 3.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 15:01
@isaacovercast my reads are 300 bp PE ezRAD and @dereneaton what parameters should I change in step 2 to have more stringent filters?
joqb
@joqb
Sep 26 2016 15:02
This message was deleted
Hello, I was trying to set the parameter file for demultiplexed, overhang trimmed reads like in pyrad 3 with "@./sorted_reads.fastq.gz" but it seems that @ is interpreted differently (path to working directory?). I wasn't able to find in the documentation how to describe this.
James Clugston
@Cycadales_twitter
Sep 26 2016 16:04
@R2C2_Lab_twitter Hi I am also using ipyrad with PE ezRAD data and did have managed to get some decent results now (even with organisms that have huge genomes). Did you do much quality filtering with your data before using ipyrad? also using the settings you used how many read did you get from step two? I looked at your params file you posted and looks like similar settings to me.
Isaac Overcast
@isaacovercast
Sep 26 2016 16:26
@R2C2_Lab_twitter Yeah, that'll do it. Generally what you see with 300bp ezRAD is pretty decent quality scores in R1, but significant quality decay in the second half of R2. The net result (as Deren mentions) is that the number of unique (spurious) clusters skyrockets. I would recommend looking at a few of your samples in something like FastQC, and then trimming off the bad part of R2. I think James has used trimmomatic with seemingly good results.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 17:05
This message was deleted
@isaacovercast no I didn’t do any quality filtering before ipyrad but I used TrimGalore when using other pipeline and indeed R2 presented a significant quality decay, according to FastQC; I’ll use those trimmed reads in ipyrad
Deren Eaton
@dereneaton
Sep 26 2016 17:10
@R2C2_Lab_twitter Yeah, the current ipyrad quality filtering is not super sophisticated, so if your data is super messy it would be good to use something like trimmomatic/trimgalore, for now. But the next big update to ipyrad (hopefully ready by tomorrow or the next day) will have a new quality and adapter filtering implementation using code from cutadapt, which is a really good filtering software.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 17:10
@Cycadales_twitter these were the stats from step 2

Summary stats of Assembly pooled

   state  reads_raw  reads_filtered
MC-O1 2 2539397 797668
MC-R1 2 2199706 676014
Deren Eaton
@dereneaton
Sep 26 2016 17:11
The full s2 stats file will give a little more detail about which filters are being applied.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 17:12
   reads_raw  filtered_by_qscore  filtered_by_adapter  reads_passed
MC-O1 2539397.0 1741729.0 0.0 797668.0
MC-R1 2199706.0 1523692.0 0.0 676014.0
s2_rawedit_stats.txt (END)
Deren Eaton
@dereneaton
Sep 26 2016 17:13
so it looks like only the quality filter is being applied currently, not searching for adapter sequences.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 17:14
I’ll remove the adapters with TrimGalore
Deren Eaton
@dereneaton
Sep 26 2016 17:14
if you set filter_adapters to 2 then it will trim reads to a shorter length when they contain illumina adapters
ok, that works too.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 17:14
great, thanks for all advise
Edgardo M. Ortiz
@edgardomortiz
Sep 26 2016 18:31
Another option outside ipyrad for quality filters is a suite of tools called bbmap (https://sourceforge.net/projects/bbmap/), it basically replaced my need for different pieces of software for sequence manipulation, every tool is parallelized so it is super fast too. I remove adapters and filter PhiX with the tool bbduk, and it also performs quality and length trimming. Here is a little intro: http://seqanswers.com/forums/showthread.php?t=41057
James Clugston
@Cycadales_twitter
Sep 26 2016 18:42
@R2C2_Lab_twitter looking at your step two filtering your losing way to much data. You need to get that upto around 90-75% being retained and you will need to use TRIMMOMATIC ect. for that. Have you checked the reads using Prinseq/FastQC where the quality drop off is? also try using a sliding window that would also help. Also as a tip I got better results using PE trimming.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 19:04
fastqc.tiff
@Cycadales_twitter the quality drop off is around 270 bp; after using TrimGalore improves significantly.
R2C2.lab
@R2C2_Lab_twitter
Sep 26 2016 19:15
@Cycadales_twitter step 2 after TrimGalore
        reads_raw  filtered_by_qscore  filtered_by_adapter  reads_passed
MC-O1val 2475250.0 824405.0 0.0 1650845.0
MC-R1val 2149888.0 876892.0 0.0 1272996.0
James Clugston
@Cycadales_twitter
Sep 26 2016 19:40
@R2C2_Lab_twitter Have you tried PrinSeq? I found that it was a little more easy to read. To me it looks like your getting a drop around the 240 mark. They are better results but your still losing almost a million reads.
@R2C2_Lab_twitter I used 150PE read and I had to crop mine to around 120BP due to the reverse reads. Try and get as much data past that filter as that way you will get more hi-depth clusters. @isaacovercast can advise you better here. But looking at your setting in your params I would not drop the settings any lower.