These are chat archives for nextflow-io/nextflow

16th
Sep 2015
Robert Syme
@robsyme
Sep 16 2015 15:02
This message was deleted

I'm often be working with sets of data (reads, for example) where each set of reads will have an associated sample_id. I'd like to split up the reads into smaller pieces, but retain the sample_id for each. I'd like to go from a channel that has:

output:
set val(sample_id), file('huge.fastq.gz') into unchunked

and then later have a process that takes:

input:
set val(sample_id), file('smaller.fastq.gz') from chunked

Where there are n >= 1 smaller.fastq.gz files for each huge.fastq.gz

Is there a way to use splitFastq with sets?

Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:17
of course :)
let me see if I find an example
basically there's nothing special
you can apply splitFastq to a channel emitting tuples, it automatically chunk the first element that is a file
otherwise you can specify what element to chunk in the tuple using the elem attribute
Robert Syme
@robsyme
Sep 16 2015 15:21
Aha, and you use elem for more complicate cases
Aha, you beat me!
Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:21
yep
Robert Syme
@robsyme
Sep 16 2015 15:21
Fantastic!
Thanks again.
Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:22
also, I would suggest to use file: true to save the chunks to files
Robert Syme
@robsyme
Sep 16 2015 15:24
Will do. Is the remainder submitted to the channel as the last element?
Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:24
yes
Robert Syme
@robsyme
Sep 16 2015 15:24
great
Robert Syme
@robsyme
Sep 16 2015 15:32
That readPrefix method from your example will come in handy, I expect.
Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:32
yep
there's an improved version on the master branch
actually, looking at a pipeline of yours I'm improved simplified also the read-pair matching code
Matthieu Foll
@mfoll
Sep 16 2015 15:36

Hi Paolo,
I have a process emiting two chanels:

output:
     file "${region_tag}.vcf" into vcf
     file '*.pdf' into PDF

I use errorStrategy 'ignore’ in this process as sometimes I expect one of the two outputs to be missing (the PDF one). But when this happens, even if the other output is present, it’s not emmited in the chanel vcf. I can understand the logic behind this behavious (in some cases it might be the right thing to do), but do see a way to allow for this?

Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:38
hi
the easiest way to handle this is creating an empty pdf file in the BASH script
and eventually filtering out the empty file from pdf channel
unfortunately there isn't a better way
Matthieu Foll
@mfoll
Sep 16 2015 15:42
ok thanks for the suggestion
how would you filter out the empty file?
Paolo Di Tommaso
@pditommaso
Sep 16 2015 15:42
something like this
PDF.filter { it.size() > 0 }.set { pdf_2 }
Matthieu Foll
@mfoll
Sep 16 2015 15:44
ok I see
thank you