These are chat archives for dereneaton/ipyrad

16th
Sep 2016
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 14:29

Hello, I have been using the following ipcluster settings on our HPC system to work on a single node:

ipcluster start --n 48 --profile=ipyrad --daemonize
sleep 60
ipyrad -p params-data.txt -s 12 --ipcluster &>s12.log

If I request 4 nodes, do I need to change anything else to my command besides the number of cores?

ipcluster start --n 192 --profile=ipyrad --daemonize
Deren Eaton
@dereneaton
Sep 16 2016 15:22
Hey @edgardomortiz You will need to call ipcluster with the following args for ipyrad to find all nodes:
ipcluster start --n 192 --profile=ipyrad --ip=* --MPI --daemonize
sleep 60
ipyrad -p params-data.txt -s 12 --ipcluster --MPI
Then ipyrad should print out the ncores/host info when it connects.
Deren Eaton
@dereneaton
Sep 16 2016 15:32
btw, for everyone else, the above code should work for anyone, but @edgardomortiz has a particular setup where it is currently necessary to call ipcluster directly. However, most users should also be able to connect to all nodes by calling the simpler command:
ipyrad -p params-data.txt -s12 --MPI -c 192
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 16:08
Perfect! Thanks Deren...
Ivan Prates
@ivanprates
Sep 16 2016 18:03
Ho @dereneaton and @isaacovercast , I tried installing version 0.3.42 using "conda update -c ipyrad ipyrad", but the version I get is 0.3.41. When specifying the latest release ("conda install -c ipyrad ipyrad=0.3.42"), I get an error: "PackageNotFoundError: Package not found: '' Package missing in current osx-64 channels: - ipyrad 0.3.42*". Any ideas? Thanks!
Deren Eaton
@dereneaton
Sep 16 2016 18:36
oh, whoops, mac version isn't up yet. @ivanprates . Will work on putting it up ASAP.
Ivan Prates
@ivanprates
Sep 16 2016 19:04
I see, thanks!
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 19:51
Sorry, it didn't work, I got the following error:
2016-09-16 12:05:21.330 [IPClusterStart] CRITICAL | Bad config encountered during initialization:
2016-09-16 12:05:21.331 [IPClusterStart] CRITICAL | Unrecognized flag: '--MPI'
Deren Eaton
@dereneaton
Sep 16 2016 19:51
doh, it should be --engines=MPI for ipcluster start and --MPI for ipyad.
Deren Eaton
@dereneaton
Sep 16 2016 20:03
The expected result is a printout like this:
 -------------------------------------------------------------
  ipyrad [v.0.3.42]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: test1
  from saved path: /fastscratch/de243/Lant/test1.json
  establishing MPI connection to remote hosts...
  host compute node: [8 cores] on compute-22-11.local
  host compute node: [8 cores] on compute-22-3.local

  Step 5: Consensus base calling 
  Mean error  [0.00277 sd=0.00107]
  Mean hetero [0.01660 sd=0.00554]
  [####################] 100%  consensus calling     | 0:13:54
Isaac Overcast
@isaacovercast
Sep 16 2016 20:14
@ivanprates v.0.3.42 for mac is now available.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 20:40
So far, so good. Thanks again.
 -------------------------------------------------------------
  ipyrad [v.0.3.42]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  New Assembly: panicum
  establishing MPI connection to remote hosts...
  host compute node: [31 cores] on nid00014
  host compute node: [25 cores] on nid00016
  host compute node: [21 cores] on nid00010
  host compute node: [48 cores] on nid00009
  host compute node: [31 cores] on nid00012

  Step 1: Linking sorted fastq data to Samples
    Linking to demultiplexed fastq files in: /scratch/01982/jdpalaci/ddrad/e_denovo_ipyrad/panicum_fastqs/*gz
    666 new Samples created in 'panicum'.
    1332 fastq files linked to 666 new Samples.

  Step 2: Filtering reads
  [                    ]   0%  processing reads      | 0:00:00
Deren Eaton
@dereneaton
Sep 16 2016 20:40
wow, badass.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 20:41
I requested 8 nodes, each with 48 procesors though.
Deren Eaton
@dereneaton
Sep 16 2016 20:41
it almost surely connected to them all.
it takes a little while for some of the engines to spin up,
and instead of waiting for them all, ipyrad distributes jobs to them as they start up
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 20:42
Oh OK, cool. Yeah, this dataset is huge...
Deren Eaton
@dereneaton
Sep 16 2016 20:42
when printing the compute node info it kinda just says wait until you think they've all connected and then print it, but I guess it didn't wait quite long enough.
yeah, I haven't tested anything that big before, I'll be interested to know how it goes
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 20:45
Sure, I can post the stats as it gets processed.
Deren Eaton
@dereneaton
Sep 16 2016 20:45
cool
Isaac Overcast
@isaacovercast
Sep 16 2016 22:16
@edgardomortiz That dataset is a monster! I love it! I'll be super curious to know about the results and also the runtimes.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 22:22
Resubmitting the job, it timed out at 2hrs, this time I will try with 12hrs to be sure.
Deren Eaton
@dereneaton
Sep 16 2016 22:23
@edgardomortiz Try adding the arg -c 192 to ipyrad.
I'll have to check, but I'm not positive that step2 will chunk the data as efficiently as it should right now when using --ipcluster flag unless you tell it explicitly how many cores to expect.
oh, maybe you were already doing that.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 22:25
Ohh I see, aborting and resubmitting now to 7 nodes...
ipcluster start --n 336 --profile=ipyrad --ip=* --engines=MPI --daemonize
sleep 60
ipyrad -p params-panicum.txt -s 12 --ipcluster -c 336 --MPI &>s12.log
Deren Eaton
@dereneaton
Sep 16 2016 22:31
:thumbsup:
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:07
Slightly different printout this time, 48 cores used on each node (4 of 7 shown)
 -------------------------------------------------------------
  ipyrad [v.0.3.42]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  New Assembly: panicum
  establishing MPI connection to remote hosts...
  host compute node: [48 cores] on nid01232
  host compute node: [48 cores] on nid01231
  host compute node: [48 cores] on nid01236
  host compute node: [48 cores] on nid01234

  Step 1: Linking sorted fastq data to Samples
    Linking to demultiplexed fastq files in: /scratch/01982/jdpalaci/ddrad/e_denovo_ipyrad/panicum_fastqs/*gz
    666 new Samples created in 'panicum'.
    1332 fastq files linked to 666 new Samples.

  Step 2: Filtering reads
  [                    ]   0%  processing reads      | 0:00:00
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:27
Just for reference, I ran this dataset before on a single node (48 cores) and steps 1 and 2 finished in 6:30hrs...
Deren Eaton
@dereneaton
Sep 16 2016 23:28
so we're hoping for ~1 hour this time then?
I'm trying to figure out why it would only find 4 nodes this time. Needs some refining, I guess.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:30
Well, it has been running for 1 hour already... let's see. The log I am saving hasn't changed so far.
Deren Eaton
@dereneaton
Sep 16 2016 23:34
It should be producing tmpfiles in the edits/ dir
/scratch/01982/jdpalaci/ddrad/e_denovo_ipyrad/panicum_edits/
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:36
Yup, if I sort them by time, it seems that 48 are being changed simultaneously...
Deren Eaton
@dereneaton
Sep 16 2016 23:37
hmm, as in you think only 48 cores are working?
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:38
I think so, only 48 show the latest time in their file properties
Deren Eaton
@dereneaton
Sep 16 2016 23:43
ok, I think I know see what the problem is and why its only distributing N jobs. Its getting N in this part of the code based on just the local machine. Will work on a fix. Thanks!
sorry about that, you might want to cancel the job for now. Some other steps will make use of all the processors, some others might not. I'll go through a check.
Edgardo M. Ortiz
@edgardomortiz
Sep 16 2016 23:46
No problem, glad to help discovering bugs