These are chat archives for dereneaton/ipyrad

26th
Aug 2016
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 14:43
Hello again, the samples I am analyzing have a very uneven distribution, the one that has more reads has 27 million pairs followed by samples with 3.8 million and so on. The problem I am having is that the sample with 27 million reads never completes the clustering step, I have reduced from 48 cores to -c 8 and even -c 4 but it doesn't seem to help, any advice on how to process it?
Deren Eaton
@dereneaton
Aug 26 2016 15:50
hmm, 27M is a big sample, and clustering takes longer for paired data than for single end. So I would expect it to take a while. That being said, we currently enforce single-threaded clustering by vsearch during step3 to avoid competing among samples running in parallel, and to avoid memory crashes for really big samples. In essence, this is a good approach if all of the samples are the same size, but not ideal if they are different sizes. It seemed like the best approach, however, given that vsearch doesn't scale linearly in performance with more threads. That being said, it has been on the todo list to apply more threads to larger samples so they would finish in a similar amount of time as smaller samples. Until we implement something like that, you would probably be best off just letting the one sample run longer until it finishes. Unfortunately, there is not checkpointing to restart a vsearch job that does not finish.
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 15:56
How do I let it run longer, it seems the program just skips it:
 -------------------------------------------------------------
  ipyrad [v.0.3.34]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: ast90
  from saved path: /scratch/02728/emo347/ddRAD/diplo/f_fulldata_ipyrad/ast90.json
  local compute node: [48 cores] on nid00061

  Step 3: Clustering/Mapping reads
  [####################] 100%  dereplicating         | 0:04:29
  [####################] 100%  clustering            | 1:50:07
  Samples failed this step:DIP-Dmeye
  Samples failed this step - ['DIP-Dmeye']
  [####################] 100%  chunking              | 3:16:05
  Samples failed this step:DIP-Dmeye
  Samples failed this step - ['DIP-Dmeye']
  [####################] 100%  aligning              | 0:00:00
  Samples failed this step - ['DIP-Dmeye']
  [####################] 100%  concatenating         | 0:02:41
  Samples failed this step:DIP-Dmeye
  Samples failed this step - ['DIP-Dmeye']

  Step 4: Joint estimation of error rate and heterozygosity
    skipping DIP-Dmeye; not clustered yet. Run step3() first.
  [####################] 100%  inferring [H, E]      | 0:17:08

  Step 5: Consensus base calling
    Skipping Sample DIP-Dmeye; not yet finished step4
  Mean error  [0.00124 sd=0.00057]
  Mean hetero [0.00848 sd=0.00209]
  [####################] 100%  consensus calling     | 0:16:32
Deren Eaton
@dereneaton
Aug 26 2016 16:00
Oh, then something else is probably going on. That sample is failing the clustering step for some reason. Can you check the files for that sample in the edits/ and clust/ directories and see if anything looks wonky compared to the other samples? e.g., file is empty, or corrupted, etc.
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 16:04

The files in the edits folder look OK:

ls5 emo347@login2: /scratch/02728/emo347/ddRAD/diplo/f_fulldata_ipyrad/ast_edits $ lax *meye*
-rw------- 1 emo347 3.1G Aug 26 03:02 DIP-Dmeye_derep.fastq
-rw------- 1 emo347  11G Aug 26 03:01 DIP-Dmeye_merged_.fastq
-rw------- 1 emo347 5.5G Aug 26 02:51 DIP-Dmeye_R1_.fastq
-rw------- 1 emo347 5.7G Aug 26 02:52 DIP-Dmeye_R2_.fastq

However in the clust folder, the clust.gz is empty:

ls5 emo347@login2: /scratch/02728/emo347/ddRAD/diplo/f_fulldata_ipyrad/ast90_clust_0.9 $ lax *meye*
-rw------- 1 emo347    0 Aug 26 04:51 DIP-Dmeye.clust.gz
-rw------- 1 emo347 9.4M Aug 26 04:51 DIP-Dmeye.htemp
-rw------- 1 emo347 702M Aug 26 04:51 DIP-Dmeye.utemp
Deren Eaton
@dereneaton
Aug 26 2016 16:06
is there any message in the log file?
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 16:07
Nothing related to that sample in particular:
2016-08-26 08:13:57,966            pid=2634           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dglut
2016-08-26 08:13:58,037            pid=2635           [jointestimate.py]         ERROR      Entering stackarray - PIO-Drhom
2016-08-26 08:13:58,141            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dcaya
2016-08-26 08:13:58,251            pid=2641           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dcall
2016-08-26 08:13:58,276            pid=2637           [jointestimate.py]         ERROR      Entering stackarray - AS2-Btric
2016-08-26 08:13:58,329            pid=2640           [jointestimate.py]         ERROR      Entering stackarray - DIP-Deric
2016-08-26 08:13:58,404            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dtenu
2016-08-26 08:13:58,432            pid=2638           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dapic
2016-08-26 08:15:00,794            pid=2635           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dgood
2016-08-26 08:15:02,401            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - PIO-Derio
2016-08-26 08:15:35,056            pid=2634           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dalve
2016-08-26 08:17:08,411            pid=2641           [jointestimate.py]         ERROR      Entering stackarray - AS2-Fhyps
2016-08-26 08:17:18,176            pid=2637           [jointestimate.py]         ERROR      Entering stackarray - AS2-Pquad
2016-08-26 08:17:41,499            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - AS2-Bgeni
2016-08-26 08:18:18,681            pid=2641           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dglan
2016-08-26 08:18:41,906            pid=2638           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dschu
2016-08-26 08:19:14,192            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dcolo
2016-08-26 08:19:14,522            pid=2640           [jointestimate.py]         ERROR      Entering stackarray - HYB-Dspc2
2016-08-26 08:19:22,663            pid=2641           [jointestimate.py]         ERROR      Entering stackarray - PIO-Drevo
2016-08-26 08:19:57,504            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - AS1-Aaspe
2016-08-26 08:20:50,746            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dgyno
2016-08-26 08:21:08,045            pid=2634           [jointestimate.py]         ERROR      Entering stackarray - PIO-Dphyl
2016-08-26 08:21:21,471            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - PIO-Drosm
2016-08-26 08:21:44,585            pid=2638           [jointestimate.py]         ERROR      Entering stackarray - AS2-Halie
2016-08-26 08:21:59,157            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - HYB-Dcine
2016-08-26 08:23:03,306            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - PIO-Drupe
2016-08-26 08:23:41,550            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dspj1
2016-08-26 08:23:52,429            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - OUT-Operu
2016-08-26 08:23:53,940            pid=2635           [jointestimate.py]         ERROR      Entering stackarray - PIO-Danti
2016-08-26 08:24:50,281            pid=2635           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dspj3
2016-08-26 08:25:15,379            pid=2640           [jointestimate.py]         ERROR      Entering stackarray - AS2-Amatu
2016-08-26 08:26:02,526            pid=2639           [jointestimate.py]         ERROR      Entering stackarray - DIP-Doxap
2016-08-26 08:26:02,791            pid=2634           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dempe
2016-08-26 08:26:53,604            pid=2637           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dhaen
2016-08-26 08:27:13,303            pid=2636           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dbarc
2016-08-26 08:27:51,841            pid=2637           [jointestimate.py]         ERROR      Entering stackarray - DIP-Dspin
Deren Eaton
@dereneaton
Aug 26 2016 16:08
are there any lines where it says [cluster_within.py] in place of [jointestimate.py]?
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 16:09
No, all are jointestimate.py
Deren Eaton
@dereneaton
Aug 26 2016 16:13
Can you email me a link where I could download the file "DIP-Dmeye_derep.fastq". Maybe I can figure it out from there.
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 16:40
I am uploading it to Google Drive now...
Edgardo M. Ortiz
@edgardomortiz
Aug 26 2016 17:05
Link sent
Deren Eaton
@dereneaton
Aug 26 2016 18:23
got it. it's clustering at 21% right now. It might have failed at the step of building clusters from the output. I'll let you know what I find.