These are chat archives for nextflow-io/nextflow

12th
May 2017
Phil Ewels
@ewels
May 12 2017 12:19
Quick question about fromFilePairs before I start experimenting.. Has anyone got a nice snippet to check whether supplied input files were grouped or not?
At the moment we have single = reads instanceof Path inside a process, which works nicely
But I was thinking that it would be nice to check this before the pipeline launches, as it's a common mistake that we're running into
I'm thinking that I can add something to the initial channel creation that sets a variable and throws an error if there are a mixture of single-end and paired-end files?
Current channel creation code is as follows:
Channel
    .fromFilePairs( params.reads, size: -1 )
    .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}" }
    .into { read_files_fastqc; read_files_trimming }
Paolo Di Tommaso
@pditommaso
May 12 2017 12:24
um, if you use size: 2 it guarantees that the channel will always return a pair of 2 elements
Phil Ewels
@ewels
May 12 2017 12:25
yup, but I don't want it do that - the pipeline handles both single end and paired-end inputs
Paolo Di Tommaso
@pditommaso
May 12 2017 12:26
if so what about making size parametric ?
Phil Ewels
@ewels
May 12 2017 12:27
how do you mean sorry?
Paolo Di Tommaso
@pditommaso
May 12 2017 12:28
Channel
    .fromFilePairs( params.reads, size: params.singleOrPairEnded )
having params.singleOrPairEnded either 1 or 2 ..
Phil Ewels
@ewels
May 12 2017 12:29
ah I see - yeah I guess but then that's an extra thing that the user has to specify
at the moment you just specify your file pattern and it either runs single-end or paired-end automatically, which is kind of nice
Paolo Di Tommaso
@pditommaso
May 12 2017 12:29
cannot it inferred by other params you have already defined ?
for example from the file pattern ?
Phil Ewels
@ewels
May 12 2017 12:30
what - count squiggly brackets or wildcards or something?
Paolo Di Tommaso
@pditommaso
May 12 2017 12:31
for example
Phil Ewels
@ewels
May 12 2017 12:32
..I guess 🤔
I was more thinking about of just doing some kind of .map{} thingy where I count the resulting objects - would that not work also?
or would that consume the channel by observing it?
Paolo Di Tommaso
@pditommaso
May 12 2017 12:35
you can do, but how would you know if it has to be single or pair ended ?
Phil Ewels
@ewels
May 12 2017 12:36
so we do that already with single = reads instanceof Path inside a process (which I think you recommended to us a long time ago..?)
Paolo Di Tommaso
@pditommaso
May 12 2017 12:36
(that would be the same problem to implement it at fromFilePairs level)
Phil Ewels
@ewels
May 12 2017 12:36
it returns true if there's only one file, false if there are more than one
sorry, to clarify - I don't care if it's SE or PE, just that it's not a mixture. And I'd like to show it as a log message
The processes query the single variable and run different commands accordingly
Paolo Di Tommaso
@pditommaso
May 12 2017 12:37
I see, you want to be sure that it's an homogenous content ..
Phil Ewels
@ewels
May 12 2017 12:37
exactly
Paolo Di Tommaso
@pditommaso
May 12 2017 12:38
but still sub-optimal
Phil Ewels
@ewels
May 12 2017 12:38
at the moment we set that variable in the first channel per-input file, but then just query the variable again later on. So whatever set it last will be used for the whole pipeline
Paolo Di Tommaso
@pditommaso
May 12 2017 12:38
imagine that you are expecting a channel of PE
and the first element is SE
Phil Ewels
@ewels
May 12 2017 12:39
in fairness, the more I think about it, maybe your suggestion of params.singleEnd is better. As most of the time we run this on PE data. Then it would throw an error if not PE and makes the user really specify that it is SE
Then that would prevent us from messing up the input file pattern and running as SE by accident when it should be PE
(which is another thing that happens quite a bit)
Paolo Di Tommaso
@pditommaso
May 12 2017 12:39
exactly
Phil Ewels
@ewels
May 12 2017 12:40
@Hammarn - any thoughts? Would this be annoying?
@pditommaso - is it possible to customise the error message if fromFilePairs gets the wrong number of input files? So that we can tell the user to specify --singleEnd?
Paolo Di Tommaso
@pditommaso
May 12 2017 12:42
which error message ?
Phil Ewels
@ewels
May 12 2017 12:42
from .fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 )
haven't tried it yet - but I guess if params.singleEnd is false and I supply a pattern for SE files it'll throw an error?
Rickard Hammarén
@Hammarn
May 12 2017 12:43
hmm, you might be right. I don't think it would be that annoying. Especially since we mostly run PE anyway..
Paolo Di Tommaso
@pditommaso
May 12 2017 12:44
nope, it will just guarantee one or two items
(sorry I need to leave now)
Phil Ewels
@ewels
May 12 2017 12:44
ERROR ~ Cannot find any reads matching: ./test_data/ngi-rna_test_set/*.fastq.gz
Would be nicer to add Please specify --single if running with single-end files
Paolo Di Tommaso
@pditommaso
May 12 2017 12:44
oops, I need to check if so
Phil Ewels
@ewels
May 12 2017 12:45
I'll make an issue :+1: :wink:
ok great - thanks for the help!
Phil Ewels
@ewels
May 12 2017 12:51
..sorry, ignore me - that error message is the one that we set with ifEmpty. So easy to change! :laughing: