These are chat archives for nextflow-io/nextflow

18th
Oct 2016
Félix C. Morency
@fmorency
Oct 18 2016 14:26
i have a running pipeline for 1 subject. what's the proper way to manage multiple subjects from multiple folders containing all the required input files?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:27
what do you mean by subject ?
experiment ?
Félix C. Morency
@fmorency
Oct 18 2016 14:28
one set of input
Input1, Input2 in FolderA == one subject (one dataset)
say I have Folder{A...Z}
is there a proper way of launching the NF pipeline to take that into account?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:30
do you mean how to handle multiple input files?
Félix C. Morency
@fmorency
Oct 18 2016 14:30
more like multiple datasets. multiple experiments.
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:31
it depends by the logic of your workflow
still not getting exactly what's your doubt..
Félix C. Morency
@fmorency
Oct 18 2016 14:32
at the moment, Workflow takes Input1 and Input2 as params
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:32
ok
Félix C. Morency
@fmorency
Oct 18 2016 14:32
I have Folder{A...Z} containing Input1 and Input2 in each of them
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:33
ok
Félix C. Morency
@fmorency
Oct 18 2016 14:33
i would like to know the proper way of launching the NF pipeline on all the Folder{A...Z}
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:33
what about fromFilePairs ?
Félix C. Morency
@fmorency
Oct 18 2016 14:34
oh cool
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:34
:)
it seems that in neuroimaging you have very similar use cases to genomics ;)
Félix C. Morency
@fmorency
Oct 18 2016 14:36
it all experiment-based
is there a way to mix using fromFilePairs when processing multiple experiments and --input1 foo --input2 bar when processing a single experiment, all in the same workflow?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:39
um, depends what do you mean for mix, however combining some of those operators it should be possible
Félix C. Morency
@fmorency
Oct 18 2016 14:40
ok, i'll take a look at it.
thanks
Paolo Di Tommaso
@pditommaso
Oct 18 2016 14:40
welcome
Félix C. Morency
@fmorency
Oct 18 2016 15:34
do you have a NF dev roadmap somewhere?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 15:36
um no, mostly because it's research project, thus we cannot commit ourself to a rigid development plan
Félix C. Morency
@fmorency
Oct 18 2016 15:40
i see. any non-rigid feature list somewhere?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 15:41
you can find some proposal on GH https://github.com/nextflow-io/nextflow/issues
a planned feature for the next year is the ability to handle sub-workflows
also we would like to support more cloud providers
Félix C. Morency
@fmorency
Oct 18 2016 15:42
cool
Félix C. Morency
@fmorency
Oct 18 2016 20:18
is there a cleaner way of doing ^
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:35
I guess that 1_some_input and 1_other_input must match 2_some_input and 2_other_input
Félix C. Morency
@fmorency
Oct 18 2016 20:35
correct
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:36
they way that you have implemented that is not guaranteed
Félix C. Morency
@fmorency
Oct 18 2016 20:36
because of garbage file can could be located in the input tree?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:38
because I don't think the file system guarantees that a directory tree is traversed in a specified order
thus you have three separate channels (i.e. directory traversals) thus it may happen they will have a different order
and the pairs won't match
Félix C. Morency
@fmorency
Oct 18 2016 20:40
what's would be the proper way of doing this in NF? sorting?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:40
this looks an advanced use of fromFilePairs
let me have a look
Félix C. Morency
@fmorency
Oct 18 2016 20:41
i tried some stuff using fromFilePairs but didn't get any luck
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:52
so the multiplicity is given by the number of folders that is variable, right?
Félix C. Morency
@fmorency
Oct 18 2016 20:52
yes
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:53
ok give me some time
Félix C. Morency
@fmorency
Oct 18 2016 20:53
i could make sure all input files have the same name in all folders
only folder names could differ
Paolo Di Tommaso
@pditommaso
Oct 18 2016 20:59
still a bit confused, each folder contains three files
and you want to run a task for each folder in the input directory ?
Félix C. Morency
@fmorency
Oct 18 2016 21:00
s/task/workflow
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:01
what does it mean? :)
Félix C. Morency
@fmorency
Oct 18 2016 21:02
in the example both tasks are ran on each folder in the input directory
in the example, the set of both tasks == one workflow
maybe my terminology is not on point
Félix C. Morency
@fmorency
Oct 18 2016 21:11
be back later, need food. thanks for your time
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:11
I think this should work
params.prefix = './input'
prefix = file(params.prefix)


Channel
    .fromFilePairs("$prefix/**/*_{some,other}_input", flat: true) { it.parent.name }
    .set { test1_ch }


process Test1 {
    tag { sid_for_test1 }
    publishDir "./results_$sid_for_test1/$task.process"

    input:
    set val(sid_for_test1), file(other), file(some) from test1_ch

    output:
    file 'dummy1'

    """
    touch dummy1
    """
}
Test2 is symmetric
I need to sleep .. ;)
bye!
Félix C. Morency
@fmorency
Oct 18 2016 21:12
ttyl! thanks!
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:12
the trick is { it.parent.name }
which groups the files by the directory name
amacbride
@amacbride
Oct 18 2016 21:33
@pditommaso Is NF using actual bash to execute the code blocks, or a Groovy re-implementation? I'm seeing a bunch of "Failed to parse template script" errors in things that are legal bash constructs.
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:33
bash
but they need to be parsed also by groovy to interpolate variables
amacbride
@amacbride
Oct 18 2016 21:45
ah, which may explain why it's choking on parameter expansion and comments.
DS=4
DS_RATE_INFO=(${ds_percent//:/ })
DS_RATE=${DS_RATE_INFO[0]}
DS_RATE2=${DS_RATE_INFO[1]}

echo $DS
echo $DS_RATE
echo $DS_RATE2
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:46
that are supposed to be bash vars?
amacbride
@amacbride
Oct 18 2016 21:46
Yes. I'm assuming I'd need to backslash-escape them?
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:47
is this a templare file?
amacbride
@amacbride
Oct 18 2016 21:47
Yes.
Paolo Di Tommaso
@pditommaso
Oct 18 2016 21:48
if you invoke the template from a shell block you won't need to escape them