These are chat archives for dereneaton/ipyrad

19th
May 2017
Jenny Archibald
@jenarch
May 19 2017 18:28
Thanks again for your advice! I started another attempt, forcing step 6 to restart, and with a longer walltime. However, checking on the job today, it appears that (like last time), it is using hardly any CPU:
"Utilized Resources Per Task: MEM: 83M SWAP: 139M
Avg Util Resources Per Task: PROCS: 0.00
Max Util Resources Per Task: PROCS: 0.01 MEM: 83M SWAP: 139M
Average Utilized Memory: 82.10 MB
Average Utilized Procs: 0.06"
That seems like a problem! Any ideas on what's happening?
Deren Eaton
@dereneaton
May 19 2017 18:41
Hi @jenarch , it depends, not every part is fully parallelizable. But it should be running at more than 100% for the majority of the time. Are you using the MPI flag? If MPI does not initiate properly then it will go much slower than normal. If you use MPI then you should also load the OpenMPI module on your system, like in the multi-node setup instructions I linked to above. If you're running on a single node without MPI then it is probably some other problem. Do you know which part of step6 it is on?
Jenny Archibald
@jenarch
May 19 2017 18:52
I am using the MPI flag and talked with our cluster people on how to get the OpenMPI module loaded. It seemed to load ok. I don't know what part of step 6 it is on, except that I started it from the beginning of 6 yesterday and it's been running for over 22 hrs at this point. If there is a way for me to check where it's at, I could do that.
I just checked a CPU statistics graph for the job that will let me look over 24 hrs. Dedicated CPUs are around 20 and utilized are close to 0 throughout that time.
Deren Eaton
@dereneaton
May 19 2017 19:27
Is there a log file being written to that you could check to see the progress so far? Would you mind sending me or pasting your job submission file?
Jenny Archibald
@jenarch
May 19 2017 19:35

Here's the job file, and it does create a lot file but it seems I cannot see it until the run finishes. #MSUB -N ipyCH4may18

MSUB -l procs=20,pmem=30gb,walltime=1440:00:00

MSUB -M jkarch@ku.edu

MSUB -m abe

MSUB -d /panfs/pfs.local/scratch/bi/jkarch/cam/ch4

MSUB -e /panfs/pfs.local/scratch/bi/jkarch/cam/ch4

MSUB -o /panfs/pfs.local/scratch/bi/jkarch/cam/ch4

MSUB -j oe

module purge
export PATH=/home/jkarch/miniconda2/bin:$PATH
module load compiler/gcc/6.3
module load openmpi/2.0
ipyrad -p params-m04c90.txt -f -s 67 -c 20 --MPI

LinaValencia85
@LinaValencia85
May 19 2017 19:50
@isaacovercast I was wondering if you had an idea of when where you uploading the new ipyrad version with the reference genome aligment bug fixed. THANKS!
Isaac Overcast
@isaacovercast
May 19 2017 19:56
@LinaValencia85 Working on it right now. I'm doing some final testing. Certainly this afternoon.
Deren Eaton
@dereneaton
May 19 2017 20:24
@jenarch I've never used msub (MOAB) specifically, but it looks similar to TORQUE (qsub)...
yeah some systems write the output while it runs and some only write it after.
Deren Eaton
@dereneaton
May 19 2017 20:39
@jenarch, I'm not sure how big the nodes are on the queue/partition that you're using, but it may be likely you can get a single 20 core node, in which case the code could be very simple like this (no MPI):
MSUB -l nodes=1:ppn=20,pmem=4g,walltime=72:00:00
MSUB -M jkarch@ku.edu
MSUB -m abe
MSUB -d /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -e /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -o /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -j oe

## ensure your conda software is loaded
source $HOME/.bashrc

## cd to where you params file is
cd $HOME/rad/analysis/

## run ipyrad by calling the params file and args
ipyrad -p params-m04c90.txt -f -s 67 -c 20
by the way, use three backticks on a line before and after a block of code to format it in the chat box like above.
Alternatively, here is an alternative way to run it with MPI that should work on any system, we hope, but like I said we haven't actually used an msub system before.
Deren Eaton
@dereneaton
May 19 2017 20:51
MSUB -l nodes=2:ppn=20,pmem=4g,walltime=72:00:00
MSUB -M jkarch@ku.edu
MSUB -m abe
MSUB -d /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -e /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -o /panfs/pfs.local/scratch/bi/jkarch/cam/ch4
MSUB -j oe

## system wide software
module purge
module load compiler/gcc/6.3
module load openmpi/2.0

## ensure your local conda software is loaded
source $HOME/.bashrc

## cd to where you params file is
cd $HOME/rad/analysis/

## start an ipcluster instance
ipcluster start --n=40 --engines=MPI --ip=* --daemonize
sleep 45

## run ipyrad by calling the params file and args
ipyrad -p params-m04c90.txt -f -s 67 -c 40 --ipcluster
Jenny Archibald
@jenarch
May 19 2017 21:00
@dereneaton Thanks! I will give that alternate MPI method a try. Meanwhile, I was testing a few things in interactive mode. When I tried running your tutorial dataset using MPI on our system (with the other method) it got stuck at 0% even for step 1, whereas another run (without MPI) of my own data seemed to be chugging along fine. So, I decided to stop the real job that didn't seem to be doing anything, and I restarted step 6 again just using one node. We don't have big enough nodes for 20 cores, but I could get 16. It does seem to be doing something now based on CPU usage! Our system used to be qsub as well, not sure what the differences are with msub but it seems pretty similar so far. I'll try your suggestions on the tutorial data to see if some of my future runs can be sped up.
Deren Eaton
@dereneaton
May 19 2017 21:04
If you tell it to look for
Jenny Archibald
@jenarch
May 19 2017 22:00
I tried the ipcluster method using your tutorial data, and it does seem to work fine on our system. Thanks again!