These are chat archives for nextflow-io/nextflow

19th
Nov 2015
Johann Visagie
@wjv
Nov 19 2015 08:16
Hi @pditommaso, sorry for the late reply — yesterday was a holiday here so I stayed off the internet. :) Yes, my processes need to be executed multiple times as part of a larger process. I understand what your’e saying about wrapping multiple processes using local pipes in a single Process — that’s what I’m currently doing. However, since I have many variations on the subprocesses I have a combinatorial explosion in the number of Processes — it would be much simpler to be able to wrap each in an individual Process.
Paolo Di Tommaso
@pditommaso
Nov 19 2015 08:42
@wjv No problem. I see your point but I'm not sure it would be easier in the way your are proposing.
You could manage the combinations at nextflow level and reduce them to a set of parameters for the target script wrapper/template
It's even better from the point of view of separation of concerns
Johann Visagie
@wjv
Nov 19 2015 08:47
@pditommaso Yeah, that’s what I’m currently doing: I’ve written a script that wraps all these binaries, executing the correct pipeline based on parameters / configuration. But as this script grew in complexity, it has occurred to me that I’m starting to create a fairly flexible workflow system to run within a workflow system. :)
Paolo Di Tommaso
@pditommaso
Nov 19 2015 08:48
ok, you know what is best for you :)
feel free to ask any advice regarding nextflow if you need
Johann Visagie
@wjv
Nov 19 2015 08:51
@pditommaso Not to worry, my current workflow is actually in Snakemake, and I considered switching to Nextflow because I assumed the mention of “asynchronous FIFO” in the documentation meant that this sort of pipeline would be executed asynchronously. But I’ve also taken it up with the author of Snakemake and he totally gets it, and has started to theorise how to adapt Snakemake to accommodate async pipelines as compound processes which are handled as a unit when parcelling out jobs to a cluster.
(Personally I would even have been happy if only forced-local processes used real FIFOs.)
Paolo Di Tommaso
@pditommaso
Nov 19 2015 08:53
how many jobs, roughly, is supposed to run your pipeline?
hundreds or thousands ?
Johann Visagie
@wjv
Nov 19 2015 09:50
@pditommaso It’s a next-gen sequencing analysis pipeline, so we’re dealing with a single-digit number of actual pipeline executions per week, but the individual tasks comprising the pipeline are mostly long-running
Paolo Di Tommaso
@pditommaso
Nov 19 2015 10:09
@wjv I see, so you could also run it on a single node.
Johann Visagie
@wjv
Nov 19 2015 10:10
@pditommaso Currently it’s only the alignment step that we parallelise, yes.
Paolo Di Tommaso
@pditommaso
Nov 19 2015 10:11
To complete the discussion on the “asynchronous FIFO” the point is that nextflow streams are different from POSIX ones
in POSIX they are streams of bytes, so you can start to process them as soon there's a single byte
with nextflow are stream of data tokens (files, objects, lists, etc) so you need to produce them before a downstream process can be executed
however with nextflow you can have multiple multiple instances of the same process executed in parallel, which is not possible (by default) in posix
so it's a different model in which you need to find a trade-off to get the best of the two
hope this helps