These are chat archives for dereneaton/ipyrad

13th
Apr 2018
Peter B. Pearman
@pbpearman
Apr 13 2018 14:54
@isaacovercast, Looks like the problem had to do with the mounted AFP file system. I transferred the data to a local drive and step 1 is running as it should. Here is a second question: I would like to branch the assembly and then explore effects of differing clust_ threshold values simultaneously using our cluster. Once the assembly is branched and the appropriate param-files exist, can I do multiple qsub so that I have simultaneous cluster instances doing steps 34567 with the different param-files and branched assemblies, all using the original working directory? Or is this going to corrupt things, maybe because of shared tmp files?
Todd Pierson
@twpierson
Apr 13 2018 16:47
@dereneaton and @isaacovercast: I love the abba-babba cookbook. Is there any straightforward way to export the toyplot figure (i.e., the one showing the phylogeny, D-statistic, and Z-scores) as a PDF?
Eaton Lab
@eaton-lab
Apr 13 2018 18:33
Thanks @twpierson, you'll want to store the returned canvas object from the plotting call and then save it using toyplot. You can find more details on saving plots in the toyplot documentation. But this should work:
import toyplot.pdf
toyplot.pdf.render(canvas, "plot.pdf")
@pbpearman This would cause a problem right now only in step3 (something we've been thinking about changing). The only file that would be reused and thus conflict between assemblies would be the dereplicated fastq files created at the beginning of step3. Each process would try to create and store these in the same location. We did this to save disk space, but could probably find a better fix. Steps 4-7 can be run on different branches in parallel just fine though.
Isaac Overcast
@isaacovercast
Apr 13 2018 18:46
@pbpearman Deren is right, I've seen people try running multiple branches on step 3 for different clustering values and it does not behave. You could work around this issue by running step 1, then creating multiple new assemblies that all import from the demux'd samples. This would of course create multiple _edits directories so it would consume lots of disk space, but if you have disk to spare and you're more interested in running lots of assemblies at once this would be the way to do it.
Peter B. Pearman
@pbpearman
Apr 13 2018 21:47
@isaacovercast @eaton-lab So, let's see if I got this right. You are suggesting that after step one, I replicate the directory holding the edits directory and then in each of these directories, branch the assembly and use the appropriately modified params file with unique values of clust_threshold in qsub calls, essentially dividing the work into separate directories for steps 2-7? Or is that not it?
Todd Pierson
@twpierson
Apr 13 2018 22:13
@dereneaton Great! That's what I had attempted originally, but I just figured out the source of my problem. The output of the abba-baba plotting functions is a tuple, but saving just the first element as a canvas object (e.g., canvas2 = canvas[0]) and then using the command you suggested seems to work. Just a heads-up for other folks attempting this!