These are chat archives for nextflow-io/nextflow

6th
Nov 2016
Rob Patro
@rob-p
Nov 06 2016 18:14
Hi, I'm testing out nextflow for a continuous integration setup we're building. I was curious what the syntax should be if I want to create a process that collects the output of numerous other processes. For example, if I have a quantification tool, and I create a generic rule that quantifies a single sample, how would I create a process that depends on all of the quantified samples?
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:17
Hi, have a look at this example
Rob Patro
@rob-p
Nov 06 2016 18:24
That's helpful, but is there a way to do it if my output files from the other process don't match the a simple naming convention like that? For example, I have an experiment with 2 conditions and 3 replicates each. My output is named like quant/A_1/quant.sf ... quant/B_3/quant.sf, etc.
my previous rule should be putting all of the names of these files into a channel
but I can't seem to force the subsequent rule to take everything from the channel at once as a list of files (or wait until it is complete)
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:26
I guess you want to maintain the original output names, right?
Rob Patro
@rob-p
Nov 06 2016 18:26
ideally
b/c the downstream analysis script assumes that the input is split like that
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:27
if so you can simply use
input: 
    file '*' from channel.toList()
or
input: 
    file allFilesHandle from channel.toList()
Rob Patro
@rob-p
Nov 06 2016 18:28
ok, that looks elegant :); I'll give it a try
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:28
then use $allFilesHandle to reference all files in your script
:+1:
though I think the subdirectories structure is not maintained ..
Rob Patro
@rob-p
Nov 06 2016 18:31
yes, it seems that just gives me a list of 6 names that are all quant.sf
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:32
I see, if you use
input: 
    file 'quant_*.sf' from channel.toList()
they will become quant_1.sf, quant_2.sf, etc
or what you can do it just rename that files in the upstream process just adding to your script
mv quant/A_1/quant.sf quant_A_1.sf
Rob Patro
@rob-p
Nov 06 2016 18:36
so, is it the case that what gets put in the channel is not the path to the file (i.e. it just assigns the file a unique identifier for later in the workflow)?
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:38
not sure to understand the question, the channel contains the file paths
the problem is that they are staged in the downstream task workdir just using the file name
Rob Patro
@rob-p
Nov 06 2016 18:39
so then the question is, if that is what I put into the channel, why is that not what comes out of the channel?
ahh, I see
so, in the work dir, the name is a sufficiently-unique identifier
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:39
yep
Rob Patro
@rob-p
Nov 06 2016 18:40
but there is not a way to get the original path back out? Alternatively, would it be possible to zip each file up with a (Groovy) variable (string, structure, whatever) that the downstream process could use to identify information about the file
i.e. instead of pass just the file out, pass a tuple like ('A', 1, 'quant.sf')
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:43
but there is not a way to get the original path back out?
not automatically (at least for now)
Alternatively, would it be possible to zip each file up with a (Groovy) variable (string, structure, whatever) that the downstream process could use to identify information about the file
that's the suggested way to go
see here
and also the doc for input set (read tuple)
and here
Rob Patro
@rob-p
Nov 06 2016 18:46
great; thanks for the quick feedback :). It also looks like if I put the directory containing each quant.sf file into the stream, I can get that back out with the original .toList() solution. For this particular use case, that also gives me enough information about the sample to run the downstream scripts
I'll take a look at these docs too
Paolo Di Tommaso
@pditommaso
Nov 06 2016 18:46
I put the directory containing each quant.sf file into the stream, I can get that back out with the original .toList() solution
true! I was forgetting that
it's an easy workaround