These are chat archives for nextflow-io/nextflow

29th
Mar 2016
Jason Byars
@jbyars
Mar 29 2016 18:48 UTC
are there any examples applying Channel transform operators to a process output in any of the repos? So far I haven't found one. I'm looking for a version of this group_read_pairs.nf, except generating the read_pairs Channel as output from a previous process.
The use case is a fastq-dump of an SRA download. I would like a tuple of SRA run name, the first fastq, and the second fastq if it exists.
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:05 UTC
not sure, but I don't think there are other examples regarding that
however your use case sounds very similar
Jason Byars
@jbyars
Mar 29 2016 19:13 UTC
ok, then I'll just have to figure out the syntax.
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:14 UTC
what's is confusing you?
Jason Byars
@jbyars
Mar 29 2016 19:16 UTC
Let's say this is the upstream process
process fastqdump {
   input:
   file x from targets 

   when:
 !file(FilenameUtils.removeExtension("${params.wd}/fastq/$x")+"_1.fastq.gz").exists()

   output:
   file "*_1.fastq.gz" into read1
   file "*_2.fastq.gz" into read2

   script:
   """
   umask 022
   fastq-dump --split-3 "$x"
   pigz -p 4 *.fastq
   #chown -R jenkins:nesslab "$x" 
   """
}
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:19 UTC
ok
Jason Byars
@jbyars
Mar 29 2016 19:20 UTC
instead of channels read1, read2 for output, I would like to output the set map/tuple { sraname, "*_1.fastq.gz", "*_2.fastq.gz "}to a channel
how do I do that?
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:20 UTC
Is x your SRA run name?
Jason Byars
@jbyars
Mar 29 2016 19:20 UTC
yes
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:20 UTC
it's easy if so ! :)
Jason Byars
@jbyars
Mar 29 2016 19:21 UTC
well it's the sra file name, but it can be easily mangled into just the sra name
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:21 UTC
process fastqdump {
   input:
   file x from targets 

   when:
 !file(FilenameUtils.removeExtension("${params.wd}/fastq/$x")+"_1.fastq.gz").exists()

   output:
   set val(x), file("*_1.fastq.gz"), file ("*_2.fastq.gz")  into read_pairs

   :
what about this?
you can use a set declaration to output a tuple of items
Jason Byars
@jbyars
Mar 29 2016 19:23 UTC
that's sort of what I'm trying. That is the part I'm a little unclear on.
is a tuple produced for each file from targets or did I just dump 3 vectors into read_pairs?
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:24 UTC
that will create a channel emitting items where each of them is a tuple of values
Jason Byars
@jbyars
Mar 29 2016 19:25 UTC
ok, so then in the next process where I use that as input can I just do ```
input:
set val(x), file(read1), file(read2) from read_pairs
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:26 UTC
yep
Jason Byars
@jbyars
Mar 29 2016 19:26 UTC
great, then I am on the right track. I'm just especially gifted at typos today.
Paolo Di Tommaso
@pditommaso
Mar 29 2016 19:26 UTC
:)
Jason Byars
@jbyars
Mar 29 2016 19:28 UTC
oh, something SRA related you might find of interest. The prefetch tool in SRA toolkit automatically uses ascp for transfers if it is present on a system.
Jason Byars
@jbyars
Mar 29 2016 20:06 UTC
great, it works. If there is no *_2.fastq.gz output, does the first process throw an error?
Paolo Di Tommaso
@pditommaso
Mar 29 2016 20:07 UTC
um, I think so. There's must be at least one matching file
Jason Byars
@jbyars
Mar 29 2016 20:08 UTC
I'll work out a test case on that one.
Paolo Di Tommaso
@pditommaso
Mar 29 2016 20:09 UTC
interesting the use of ascp I need to give a look to it