May 2018
Rebecca Tarvin
May 14 2018 00:06
@isaacovercast Yes I am able to get more wall time, but now it's 11hr into the databasing step and only 4% done so it may not finish in even 96hr. It's tough to convince the supercomputer people to give us extra time and difficult to troubleshoot the step for that reason. It looks like there is check-pointing for the clustering, though (thanks!), so maybe there is hope for this to finish in the next round.
@csjalbert I had this issue on TACC a few weeks back - it's not ipyrad that's the problem. It was a few problematic nodes in the largemem that weren't working.
Jean-RĂ©mi Trotta
Hi @isaacovercast , would you have any tip in order to speed up the building database process at step 6?
I launched this step with 24 cpus, but it seems that for this particular part of the process ipyrad can not take advantage of all of them, only few resources are used.

[                    ]   1%  building database     | 10 days, 18:00:36

Did you already try assemblies with such number of samples (600 samples)? Or am I pushing to far the use of ipyrad?

Isaac Overcast
May 14 2018 15:41
@jrtrottablanc This step is single-threaded and it is currently the biggest bottleneck. We are aware of this and have an idea for a fix, but it's not in the works yet. There is no workaround, you just have to wait it out.
Deren Eaton
May 14 2018 15:48
@isaacovercast @jrtrottablanc , technically, it is in the works, but I don't want to promise anything just yet until we have it working, which may be a little while.
Bruno de Medeiros
May 14 2018 20:09
@isaacovercast One more question on this topic. I manually checked some of the SNPS in the vcf file against the bam files of reads mapped to the genome and noticed that the vcf coordinates are off by a few base pairs. This shift seems to be consistent within a locus but it varies between loci. Is it possible that some step is breaking the connection between reported coordinates and original mapping? Maybe removal of restriction overhang, or alignment? My plan is to combine the ddRAD vcf file with other vcfs, so mapping to the exact site would be important.
Deren Eaton
May 14 2018 22:36
@brunoasm something we're working on. At the moment we include an alignment step which aligns all of the samples against each other (this makes for a better alignment than just against the reference), however this can impute indels relative to the reference that are not currently accounted for in the VCF coordinates.
We're probably going to provide methods for both with and without the all-by-all alignment so that users can have either a pure reference aligned data and/or data where the reference is only used to provide spatial information about the loci but the loci are aligned against each other.