These are chat archives for nextflow-io/nextflow

15th
May 2018
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:15

Given

output: 
    stdout blah 

script:
"""
    produce_something_on_stdout
"""

Is the stdout of the process always supposed to stay in .command.out and .command.log regardless of another process having

input:
    stdin blah

and consuming the input?

Paolo Di Tommaso
@pditommaso
May 15 2018 06:30
output: 
    stdout blah
should capture the task stdout independently the downstream task, therefore it should *not* in the log or the command output
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:34
I'll be more specific then, the two processes could be for example:
process hisat2Align {
  tag {name+" vs "+dbname}
  input:
    set val(name), file(r1),file(r2) from hisat2reads
    set val(dbname), file("hisat2db.*.ht2") from hisat2dbs

  output:
    val tag into samname
    stdout sam

  script:
    tag = name+"_vs_"+dbname
    """
    hisat2 -x hisat2db -f -1 ${r1} -2 ${r2} 
    """
}
process sam2bam {
   tag {nametag}
  input: 
    stdin sam
    val nametag from samname

  output:
    set val(nametag), file("*bam") into BAMs

    """
    samtools view -bS -F 4 -F 8 -F 256 > ${nametag}.bam
    """
}
Paolo Di Tommaso
@pditommaso
May 15 2018 06:36
and?
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:36
I know I could simply redirect stdout from hisat into samtools within one NF process, but wanted to keep things separate (proof of concept pipeline) and expected the stdin / stdout to be turned into a FIFO
instead, the BAM gets generated but all the SAM contents end-up in .command.out and .command.log as well
and given more realistic input (millions rather than thousands of reads) I get OutOfMemory error from Java/nextflow
Paolo Di Tommaso
@pditommaso
May 15 2018 06:39
stdin / stdout is just syntax for stdout redirect to a file
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:39
ok
Paolo Di Tommaso
@pditommaso
May 15 2018 06:40
there isn't (at least for now) byte stream from one process to another
yes in the above case you should redirect hisat output to a file
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:42
I expect the logic around chaining processes together in different compute envs would get rather complex when trying to avoid intermediary files
Paolo Di Tommaso
@pditommaso
May 15 2018 06:43
exactly, most of the times impossible
at the end of the day, of you want stream that tools, put together in the same process !
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:44
Ok, so in my case a trade-of between modularity and separation of logic vs generation of excess files. Will stick them together.
Paolo Di Tommaso
@pditommaso
May 15 2018 06:46
yes, but the main advantage of NF process is to manage parallel task executions
having a direct dependency between two is just fine to have them in the same process
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:48
makes sense, I was perhaps over-optimistic to try to keep things to single module/container per process. I guess this is the difference between a toy pipeline I have at the moment with more real world scenarios.
No complains about parallel execution so far - in a number of compute envs too :-)
Paolo Di Tommaso
@pditommaso
May 15 2018 06:49
people go crazy with this model, I really don't see any real benefit
Radoslaw Suchecki
@bioinforad_twitter
May 15 2018 06:50
Perhaps you make it too easy with NF ;-)
Paolo Di Tommaso
@pditommaso
May 15 2018 06:51
it's a mix of pragmatism and laziness .. but it works! :wink:
Vladimir Kiselev
@wikiselev
May 15 2018 12:22
hi Paolo, how to use a pipeline from a specific github branch?
not master
found, thanks!