These are chat archives for dereneaton/ipyrad

22nd
Aug 2016
Deren Eaton
@dereneaton
Aug 22 2016 15:03
@Cycadales I haven't had much luck with faststructure or with admixture. I've taken to using the original structure and just running replicates in parallel across many cores and letting them run for about a week. But hopefully one of these newer methods works. I haven't tried sNMF or fineradstructure yet. For the latter if the input format is not available from ipyrad maybe you could raise an issue ticket on the ipyrad github page and provide an example input file.
Deren Eaton
@dereneaton
Aug 22 2016 15:11
@katiacapel Yes, like Isaac said, you will have to set the trim_overhang parameters to be at least as low as the min samples parameter, in your test case 3. This is kind of a bug that we plan to fix.
Cycadales
@Cycadales
Aug 22 2016 16:11
@dereneaton thanks for the reply. How many cores are you running structure on if you don't mind me asking. With regards to the support ticket thing I that is a very good idea! I can do that and it could also be useful for others. Also I have a quick question with regard to step 3 on a HPC, when we run step two its seems to use all 6 nodes but when running step 3 it only seems to use or to two. Could this by a vsearch thing?
Deren Eaton
@dereneaton
Aug 22 2016 16:16
@Cycadales I've been running structure with just one thread per job, but I run 40 jobs in parallel, usually 10 replicates at each value of K that I test. I've been meaning to put an example in the 'cookbook' section of the ipyrad docs. My workflow is a variant on what I did here (http://nbviewer.jupyter.org/github/dereneaton/virentes/blob/master/nb4_virentes_populations.ipynb) though that code is a bit outdated now. It uses the ipyparallel Python library to run jobs in parallel.
NB: HPC instructions are now updated: http://ipyrad.readthedocs.io/HPC_script.html
Deren Eaton
@dereneaton
Aug 22 2016 16:21
@Cycadales If you are running your HPC job over multiple nodes it is actually a bit difficult to monitor how ipyrad has distributed the jobs among cores. For exampe, if you run the command top you will only be able to see what's running on the node you're connected to, but not on other nodes. This is why the latest version of ipyrad now prints out explicitly how many cores it's found across how many host nodes. If you running all on one node, however, then you should be able to see it run just as if it were run locally. Step 3 is parallelized a bit differently from step2, and it could be that some of the steps have finished while others have not, and so that's why you see only 2 cores running. It's hard to know for sure without more details though.
Cycadales
@Cycadales
Aug 22 2016 16:27
@dereneaton excellent I will take a look at that and figure it out. Just checked out your updated HPC script and that looks like it would help a lot getting it all up and working. Ok we will check this and see what we can figure out. We are just doing a quick test first across 64 cores with 10 samples and see how long it will take to complete (without errors) before running a bigger job.
Katia Capel
@katiacapel
Aug 22 2016 20:49
Thanks @dereneaton, I changed the trim_overhang parameter and its working now!