These are chat archives for nextflow-io/nextflow

17th
Nov 2015
Johann Visagie
@wjv
Nov 17 2015 06:54
Hey @pditommaso. I have several use cases of the following nature: As part of a larger pipeline, I have three long-running processes that pass data in the normal STDOUT to STDIN fashion, i.e. in pseudo-shell notation a | b | c.
However, depending on certain parameters of the data, a can be executed with different parameters, i.e. a’ | b | c
And depending on a configuration variable, b is sometimes omitted, a | c.
Currently, the only way to do this in Nextflow is to create a composite Process for each possible combination, i.e. one for a | b | c, one for a' | b | c, one for a | c and one for a' | c
And I have cases with much more variation, and the number of Processes I have to create quickly explode combinatorially.
If I wrap a, b and c separately in processes, my runtime more than triples (and I have loads of unnecessary temporary files build up in .work)
Johann Visagie
@wjv
Nov 17 2015 06:59
A more fine-grained and less complex approach would be possible if (say) all local processes that pass data via stdin and stdout used a named pipe. Then, in the above example, I could wrap a, a', b and cand simply connect them together with channels.
Johann Visagie
@wjv
Nov 17 2015 07:05
(I do understand that this would potentially break the neat abstraction of processing that you currently have, though.)
Paolo Di Tommaso
@pditommaso
Nov 17 2015 07:13
Since you are saying that these three long-running processes are part of large pipeline, IMHO it would make sense to manage them as a single process at nextflow level.
You could wrap them with a bash script and get the benefits of using local pipes, at the same time manage the remaining parts of the pipeline with nextflow, which would allow you to distribute the execution in a cluster (if needed)
It is not mandatory to isolate each task with a nextflow process, indeed the framework allows you to choose the granularity that best fits the needs of your workflow.
Paolo Di Tommaso
@pditommaso
Nov 17 2015 07:24
(improved the english grammar, just wake up .. you know ;))
Paolo Di Tommaso
@pditommaso
Nov 17 2015 07:31
@wjv Do the a | b | c processes need to be executed multiple times with different data in your use case?