These are chat archives for nextflow-io/nextflow

4th
Jun 2018
Maxime Garcia
@MaxUlysse
Jun 04 2018 07:35
what is your complete process?
micans
@micans
Jun 04 2018 10:23
I'm trying to find information about libraries+functions after seeing @MaxUlysse message, with the static def FNAME() {} syntax. It looks like code that can be invoked in a script section. I've found these instructions: https://groups.google.com/forum/#!topic/nextflow/thEocxiO6as . Is that relevant, and is there more information that I somehow managed to avoid finding?
Maxime Garcia
@MaxUlysse
Jun 04 2018 10:25
@micans What are you trying to do?
I got all my inspiration from making script functions from BioNextflow
micans
@micans
Jun 04 2018 10:29
This is just my quest to understand the possibilities of NF. This looks like something to promote re-use. Is that correct? Or are templates the way to do that? In that case, my question is in what case is the static def F() {} used - not looking for an exhaustive answer, just a pointer.
Oh, thx, will check out BioNextflow
Maxime Garcia
@MaxUlysse
Jun 04 2018 10:31
Had the same questionnement about template vs functions, I'll be interested into what you figure out and choose in the end
It's always good to get help from other adventurers on such a quest :-D
Luca Cozzuto
@lucacozzuto
Jun 04 2018 10:39
Actually if instead of functions we could use proper components (i.e. processes) would be fantastic
it could be time to start a conversation with @pditommaso on this :)
micans
@micans
Jun 04 2018 10:43
Well, I'll just be listening. Still very much in the information-gathering stage. There's not a lot of information about functions that I could find, except for that link above.
Maxime Garcia
@MaxUlysse
Jun 04 2018 10:45
IMHO the best source of information is here ;-)
micans
@micans
Jun 04 2018 10:47
I do try to follow this closely!
Paolo Di Tommaso
@pditommaso
Jun 04 2018 11:45
@MaxUlysse use functions if you prefer java/groovy code (and tests) and templates if you prefer Bash code and tests
onuryukselen
@onuryukselen
Jun 04 2018 13:18
@MaxUlysse My complete process is the following:
params.mate = "single" //or pair
params.pairs = "/home/fastqfiles/*{1,2}.fastq"
Channel
    .fromFilePairs( params.pairs , size: (params.mate != "pair") ? 1 : 2, flat:true)
    .set { reads }

process split {
    input:
    set pair_id, file(read1), file(read2) from reads.splitFastq(by: 10000, pe:(params.mate != "pair") ? false : true, file:true)
    output:
    set pair_id, file("$read1"), file("$read2") into splitReads
}
Maxime Garcia
@MaxUlysse
Jun 04 2018 13:38
@pditommaso Thanks for the answer
@onuryukselen where is your script statement ?
I don't believe you could have a process without a script
otherwise, you can just transform your channels
onuryukselen
@onuryukselen
Jun 04 2018 14:08
@MaxUlysse I separated the split and the mapping processes for simplicity. So split process is followed by mapping process, and split just transform the channel
Rohan Shah
@rohanshah
Jun 04 2018 15:49
when executing processes in AWS Batch, how does one set the nxf-scratch-dir to be used in the Batch container? I tried setting NXF_TEMP to /docker_scratch but I still get a nxf-scratch-dir equal to /tmp/nxf.XXXXmAlKbf
Paolo Di Tommaso
@pditommaso
Jun 04 2018 15:51
try process.scratch = '/something'
Rohan Shah
@rohanshah
Jun 04 2018 15:53
ah ok thank you will do
what's the difference between the two
Paolo Di Tommaso
@pditommaso
Jun 04 2018 15:54
NXF_TEMP is for nextflow driver application
the other latter for executed processes
Rohan Shah
@rohanshah
Jun 04 2018 15:54
ah ok ty
Paolo Di Tommaso
@pditommaso
Jun 04 2018 15:54
:+1:
onuryukselen
@onuryukselen
Jun 04 2018 15:57
@MaxUlysse is there a way to turn on/off parameter of a function based on condition? I want to completely remove "pe" parameter from splitfastq in case param.mate = "single"
Maxime Borry
@maxibor
Jun 04 2018 16:35
Hello everyone,
I'm trying to gather the results of different processes on different samples to generate one summary file per sample.
However, the way I'm doing it right now, samples get mixed up in the summarize process (below):
Which is the best way to do it ?
process summarize_results {
    tag "$name"

    publishDir "${params.results}/summary", mode: 'copy'

    input:
        set val(name), file(a) from result_a
        set val(name), file(b) from result_b
        set val(name), file(c) from result_c
    output:
        set val(name), file("*.csv") into summary_result

    script:
        outfile = name+".summary.csv"
        """
        summarize_all_methods -a $a -b $b -c $c -o $outfile
        """
}
Luca Cozzuto
@lucacozzuto
Jun 04 2018 16:37
I think you should join result_a b and c
  set val(name), file(a), file(b), file(c) from result_a.join(result_b, result_c)
Maxime Borry
@maxibor
Jun 04 2018 16:42
Clever ! So here the matching key would be name if I understand the doc correctly ?
Luca Cozzuto
@lucacozzuto
Jun 04 2018 16:52
From the doc "The join operator creates a channel that joins together the items emitted by two channels for which exits a matching key. The key is defined, by default, as the first element in each item emitted."
Maxime Borry
@maxibor
Jun 04 2018 17:08

Thanks a lot !

I'm trying to apply it, but I'm running into the following error:
ERROR ~ No signature of method: groovyx.gpars.dataflow.DataflowQueue.join() is applicable for argument types: (groovyx.gpars.dataflow.DataflowQueue, groovyx.gpars.dataflow.DataflowQueue) values: [DataflowQueue(queue=[]), DataflowQueue(queue=[])]

My upstream process are like that:

process process_a {
    tag "$name"

    input:
        set val(name), file(a) from input_a
    output:
        set val(name), file("*.a") into result_a
    script:
        outfile = name+".a"
        """
        echo $a > $outfile
        """
}

process process_b {
    tag "$name"

    input:
        set val(name), file(b) from input_b
    output:
        set val(name), file("*.b") into result_b
    script:
        outfile = name+".b"
        """
        echo $b > $outfile
        """
}

process process_c {
    tag "$name"

    input:
        set val(name), file(c) from input_c
    output:
        set val(name), file("*.c") into result_c
    script:
        outfile = name+".c"
        """
        echo $c > $outfile
        """
}


process summarize_results {
    tag "$name"

    publishDir "${params.results}/summary", mode: 'copy'

    input:
        set val(name), file(a), file(b), file(c) from result_a.join(result_b, result_c)
    output:
        set val(name), file("*.csv") into summary_result

    script:
        outfile = name+".summary.csv"
        """
        summarize_all_methods -a $a -b $b -c $c -o $outfile
        """
}
Luca Cozzuto
@lucacozzuto
Jun 04 2018 17:30
try just with
set val(name), file(a), file(b)  from result_a.join(result_b).join(result_c)
Luca Cozzuto
@lucacozzuto
Jun 04 2018 17:35
this is the toy example
left = Channel.from(['X', 1], ['Y', 2], ['Z', 3], ['P', 7])
right= Channel.from(['Z', 6], ['Y', 5], ['X', 4])
center=Channel.from(['Z', 4], ['Y', 4], ['X', 7])
left.join(center).join(right).println()
Maxime Garcia
@MaxUlysse
Jun 04 2018 17:55
@onuryukselen I think so, but I'm just not sure about your process
Maxime Borry
@maxibor
Jun 04 2018 19:11
Thanks @lucacozzuto !
I'm also trying with
set val(name), file(a), file(b), file(c)  from result_a.mix(result_b, result_c).groupTuple()
Paolo Di Tommaso
@pditommaso
Jun 04 2018 19:15
this is a good idea, but the definition should be
input:
set val(name), file(all_files) from result_a.mix(result_b, result_c).groupTuple()
Maxime Borry
@maxibor
Jun 04 2018 19:23
Thanks @pditommaso , the Nextflow team never stops working ? What a hard life at CRG ;)
Paolo Di Tommaso
@pditommaso
Jun 04 2018 19:25
working? I dont' remember working in the last 5 years ;)