These are chat archives for dereneaton/ipyrad

19th
Sep 2016
Shea Lambert
@SheaML
Sep 19 2016 13:57
Hi @dereneaton, it could be the former, I tried to save some time by starting at step 3. Starting over now, I'll send the .json if I get another error. Thanks.
Deren Eaton
@dereneaton
Sep 19 2016 14:10
@SheaML Thanks, you won't usually have to start over when we make updates, but this problem in particular I suspect may be caused by that.
Edgardo M. Ortiz
@edgardomortiz
Sep 19 2016 14:44
Hello, I will split the huge dataset of 666 samples into 6 subsets (to merge in step 4 right?), do I need a popfile.txt for each subset or I can reuse the same one for the 6 params files?
Isaac Overcast
@isaacovercast
Sep 19 2016 14:48
@edgardomortiz Hey Edgardo, you should be able to use the same populations file for all six subsets. ipyrad will complain about sample names in the pops file not in the assembly, but it'll do the right thing and carry on.
Edgardo M. Ortiz
@edgardomortiz
Sep 19 2016 14:49
Thanks Isaac, I'll do that!
cldebban
@cldebban
Sep 19 2016 14:56
This message was deleted

Hi @dereneaton, I've been having trouble getting step 2 to run on my dataset. Step 1 seems to run fine, but then step 2 sits like this without progressing with processing reads for as long as I'll let it:

-------------------------------------------------------------
  ipyrad [v.0.3.41]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: GA
  from saved path: /sfs/lustre/scratch/cld8jj/ddRAD/iPyrad/GA.json
  local compute node: [20 cores] on udc-ba33-10i

  Step 2: Filtering reads
  [                    ]   0%  processing reads      | 0:07:43

I'm not sure where to start with troubleshooting this. I'm using paired end ddRAD data that has already been demultiplexed.

Isaac Overcast
@isaacovercast
Sep 19 2016 15:02
Hey @cldebban, a couple things to try, first of all there's a newer version of ipyrad on conda: conda install -c ipyrad ipyrad will pull down the newer version 0.3.42.
What does the output of ipyrad -p <your_paramsfile> -r look like?
One more thing to try is rerunning step 2 with the -d flag. this will turn on debug logging and create the ipyrad_log.txt file with lots of info in it. If you want to re-run step2 for awhile and send me this file I'll check it out.
cldebban
@cldebban
Sep 19 2016 15:07
Great, thanks! I'll update the software and add the -d flag. The output looks like this:
-bash-4.1$ ipyrad -p params-GA.txt -r

Summary stats of Assembly GA
------------------------------------------------
         state  reads_raw
GA22-1       1     752134
GA22-10      1     591482
GA22-11      1     312369
GA22-12      1    1057709
GA22-2       1     650782
GA22-3       1     462873
GA22-4       1    6336951
GA22-5       1     547363
GA22-6       1     828374
GA22-7       1    1078798
GA22-8       1    1335181
GA22-9       1     836128


Full stats files
------------------------------------------------
step 1: None
step 2: None
step 3: None
step 4: None
step 5: None
step 6: None
step 7: None
Deren Eaton
@dereneaton
Sep 19 2016 15:12
@cldebban , what is your datatype (e.g., rad, gbs, etc)?
oh sorry, you said
Deren Eaton
@dereneaton
Sep 19 2016 15:18
@cldebban did you stop it, or have you left it running for a bit longer? I think that the progress bar does not exactly track appropriately right at the beginning for step2 currently, but it should catch up after it is done chunking the files up into smaller bits that it is going to process.
cldebban
@cldebban
Sep 19 2016 15:22
This try has only been running for a few minutes, but I left my full dataset (403 samples) running over the weekend for almost three days and it was the same the whole time:
-------------------------------------------------------------
  ipyrad [v.0.3.41]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: b2
  from saved path: /scratch/cld8jj/ddRAD/iPyrad/b2.json
  local compute node: [20 cores] on udc-ba35-16a

  Step 2: Filtering reads
  [                    ]   0%  processing reads      | 2 days, 21:11:39
Deren Eaton
@dereneaton
Sep 19 2016 15:24
oh, then that's not right, this step should be very fast
Isaac Overcast
@isaacovercast
Sep 19 2016 15:30
Yeah, should be super fast, less than an hour. Let it run for a little bit, then kill it and email me the ipyrad_log.txt iovercast@gc.cuny.edu
cldebban
@cldebban
Sep 19 2016 15:31
Will do, thanks!
Emily Warschefsky
@ewarschefsky_twitter
Sep 19 2016 16:02
Hey @dereneaton - what causes samples to fail the dereplication part of step 3? I ran the step 3 (on the same branched dataset) for three different cluster thresholds and for each threshold a different individual has failed.
Edgardo M. Ortiz
@edgardomortiz
Sep 19 2016 16:05
That usually happens when vsearch cannot allocate memory, I solved it reducing the number of parallel processes (the nodes I work on have 48 cores but for step 3 i use -c 8)
Deren Eaton
@dereneaton
Sep 19 2016 16:12
@edgardomortiz good to know. This is hard to manage easily, since it depends a lot on how well the data dereplicates, which affects how big the memory overhead will be. The new threading options will allow running fewer samples at a time but with more processors assigned to each sample, which will reduce memory load to an extent. This is currently implemented, but we have it harcoded to do threads=2. We will make this available as an option to users.
@ewarschefsky_twitter are you running the latest version? The memory limit that Ed's data crashed on should be fixed in the latest version. But it is possible you got a different error. Does it happen to your sample with the most data?
Emily Warschefsky
@ewarschefsky_twitter
Sep 19 2016 16:26
@dereneaton - for one of the clustering thresholds it gives the error for the two samples with the most data, but for the other clustering thresholds it gives the errors on samples that do not have the most data and manages the larger samples fine. I will update ipyrad just in case