These are chat archives for nextflow-io/nextflow

May 2018
Phil Ewels
May 12 2018 05:20
Interesting discussion about many vs single images!
We've discussed the same thing a few times, and I think that my take-home message is that "it depends". Both approaches have pros and cons.
For our pipelines, I prefer having a single container for all processes. I think it's better for reproducibility - it's easier to have a 1:1 relationship between container and pipeline, with tagged versions. The more containers you have, the more difficult it is to guarantee that (though of course you can specify hashes for each container etc, so it's not that difficult).
For maximum portability we also like to support both conda and images, and a single Docker image built from a conda environment file is beautifully simple.
Finally, for users running offline, it's much easier to pull and transfer a single singularity image. Having to manually do that 20 times per pipeline would be a bit of a pain (without a helper tool anyway).
Finally, personally I find it easier to write pipelines this way - eg. I can add a samtools index command onto a process without having to think about dependencies.
The downside is incompatibilities between software and not being able to use pre-existing images as far as I can tell. Our pipelines are rarely complex enough to have problems with the former (with a few exceptions, eg. Sarek), and the latter isn't really an issue for us time wise when using conda.
Phil Ewels
May 12 2018 05:25
Oh, and having a single container means that you can have one process to collect all software version numbers for a MultiQC report ;) But that's a minor thing.
Lavanya Veeravalli
May 12 2018 14:02
@ewels How about on cloud setup like batch?
Phil Ewels
May 12 2018 22:56
I don’t have so much experience of that, but I don’t see that the two options would be very different then..