These are chat archives for dereneaton/ipyrad

Jul 2017
Deren Eaton
Jul 05 2017 15:04
Hi @joqb, since you are connecting to a single node the --MPI is not needed, although it shouldn't cause any problems. The clustering method is still the same that was used by pyrad so it should run at about the same speed, or actually faster since the vsearch algorithm has improved over time. 19 days is certainly excessive, so I'm guessing something is going wrong. I have run data sets with several hundred samples and millions of loci that finished step 6 in only a few days, so I'm sure we can get it to run faster...
@joqb Just to make sure things are working as we would expect, try running a quick analyses through the final two steps with just a subsample of your samples. To do this create a new branch with a few samples selected and run the last two steps. If this works fine then try it again with the full data, and without the MPI flag. If it is possible to request more memory then make sure you request it, we generally suggest ~4GB per core, so if you are using 32 cores I would hope that you have ~120GB RAM. If you only have 64GB RAM then I would recommend running step 6 with only ~20 cores.
Deren Eaton
Jul 05 2017 15:12
## create a new branch with a subsample of taxa for testing
> ipyrad -p params-all_ind_2017.txt -b new sampleA sampleB sampleC sampleD sampleE

## run test on new branch that should finish very quickly
> ipyrad -p params-new.txt -s 67 -f -c 32 

## If all goes well, try again with the full data set requesting fewer resources this time
> ipyrad -p params-all_ind_2017.txt -s 67 -c 20