These are chat archives for dereneaton/ipyrad

30th
Aug 2017
Deren Eaton
@dereneaton
Aug 30 2017 14:06
@mdrphd hmm, thanks, I'll look into it.
@mdrphd So you have 11 libraries, each of which might contain technical replicates, and you are demuxing each separately, and then you'll combine them with merging after step 1, right? But you're finding that technical replicates are not being properly identified within each library yet, so you haven't gotten to merging yet.
mdrphd
@mdrphd
Aug 30 2017 15:00
@dereneaton Yes. That is exactly right.
James Clugston
@Cycadales_twitter
Aug 30 2017 19:47
@dereneaton @isaacovercast Hi guys I have a problem with system usage on a server running an ipyrad job of two samples. I have two samples with really large datasets and I have branched them following Darens helps but its using way more resources than I have allocated to it. This is the command I used
ipyrad -p params-Carm2.txt -c 48 -t 24 -s 3 -r >& Carm2 &
Screen Shot 2017-08-30 at 20.48.27.png
Deren Eaton
@dereneaton
Aug 30 2017 19:58
@Cycadales_twitter I was just talking with someone else about this. Step 6 currently is set to use all available cores during vsearch clustering, so it ignores the -c and -t options. The next release will fix this so you can limit it with -c or -t, technically by min(c, max(t, 10). So it will use 10 threads unless -t is set to a higher value, but t can't be greater than c. A bit complicated but it seemed like the best option.
James Clugston
@Cycadales_twitter
Aug 30 2017 19:59
@dereneaton ahh ok I see! Thank you that helps loads
Deren Eaton
@dereneaton
Aug 30 2017 20:09
Oh, but are you running step 6? Step 3 should be properly threaded, so in your case the vsearch jobs should be running 2 jobs with 24 threads each. But it looks like there are many more vsearch jobs running concurrently...
James Clugston
@Cycadales_twitter
Aug 30 2017 20:10
@dereneaton No I am running step 3.
Deren Eaton
@dereneaton
Aug 30 2017 20:53
If the assembly only has two samples in it then it shouldn't be possible to have more than two vsearch jobs running... maybe I'm not understanding something...
nclowell
@nclowell
Aug 30 2017 20:54
Hi @dereneaton @isaacovercast, I'm trying to create a catalog of loci to use as a "reference genome" for a non-model species, and I'd like to include loci that do not have SNPs. So far I've been using the x.alleles.loci file to create one, but this one only includes sequences with SNPs. Are there any intermediate or final output files that I have overlooked that I could use to create such a catalog? Is there an option to get more intermediate files during the ipyrad steps? Thanks!
James Clugston
@Cycadales_twitter
Aug 30 2017 21:19
@dereneaton there was only two samples so I have had to stop the job as I am unsure what the system is doing. As its using the whole cluster
server sorry
Isaac Overcast
@isaacovercast
Aug 30 2017 21:30
@nclowell The *.alleles.loci file should work fine for this, it does include invariable loci. Here's an example from an assembly I just did yesterday:
Boehm_Pipefish_HENY20A_sequence_1_1        GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_HESGulf02B_sequence_1_0     GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_HESGulf02B_sequence_1_1     GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_HETB01A_sequence_1_0        GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_HETB01A_sequence_1_1        GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_Ir03-DHE_sequence_1_0       GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
Boehm_Pipefish_Ir03-DHE_sequence_1_1       GTAAGGGGCTCGGAATCTTTATAGGTCAAGATATTTCCCCATTTTATTAAAGCAATAACTTTGATGACCAGAAATGGTCAAGAACT
//                                                                                                                               |16|
@nclowell To answer your other question, if you re-run any step with -d it will print more debug output and also it won't clean up most intermediate files....