These are chat archives for nextflow-io/nextflow

27th
Sep 2018
spaceturtle
@spaceturtle
Sep 27 2018 07:12
A quick question. When I use 'scratch', will Nextflow automatically remove the temporary directory on the local disk after finishing the process?
Paolo Di Tommaso
@pditommaso
Sep 27 2018 07:25
yes
spaceturtle
@spaceturtle
Sep 27 2018 07:30
@pditommaso Thanks for quick answer.
Paolo Di Tommaso
@pditommaso
Sep 27 2018 07:31
welcome
micans
@micans
Sep 27 2018 13:09
:+1:
Félix C. Morency
@fmorency
Sep 27 2018 13:29
Gratz!
Bioninbo
@Bioninbo
Sep 27 2018 13:40
Hi there.
Is there is simple way to filter a channel only if a global variable is true (e.g. params.var) ?
I am thinking of something like this:
if(params.var == 'true') { channel1.filter{it[1] == 'control'}.set{channel1} }
I succeeded by splitting a channel in two channels depending on the global condition, filtering both differently and then merging them in a third channel, but it is ~10 lines of codes and quite ugly so I am wondering if there is simpler way
Paolo Di Tommaso
@pditommaso
Sep 27 2018 14:35
the proper way it's
channel1.filter{ params.var=='true' || it[1] == 'control'  }.set{ channel1 }
considering that params.var should be a boolean type not a string that could be written
channel1.filter{ params.var || it[1] == 'control'  }.set{ channel1 }
Bioninbo
@Bioninbo
Sep 27 2018 14:38
I see thanks Paolo!
Paolo Di Tommaso
@pditommaso
Sep 27 2018 14:39
:ok_hand:
Bioninbo
@Bioninbo
Sep 27 2018 14:48

And to filter based on a variable?
I tried:

filterVar= "it[1] =~ /RNA/"
channel1.filter{ Eval.me( filterVar ) }}

but got the error "ERROR ~ No such variable: it"

cwytko
@cwytko
Sep 27 2018 15:15

Good morning!
I noticed that when I try to run really large inputs through nextflow, all the processes of a particular step must be completed before advancing to the next process.
Is there a way to chunk nextflow so it grabs my inputs in parts?

Here is the workflow: https://github.com/SystemsGenetics/GEMmaker
it usually waits and gets stuck entirely on downloading the SRR's, SRX's, and SRS's from NCBI

Evan Floden
@evanfloden
Sep 27 2018 15:16
@Bioninbo this snipet combines both to hopefully solve the issue. Testing with nextflow console:
var_bool = false
var_regex = '/RNA/'

Channel
  .from(['1','RNA'],['2','DNA'],['3','RNA'])
  .set { channel1 }

channel1.filter{ var_bool || it[1] =~ Eval.me(var_regex) }
        .set{ channel1 }

channel1.view()
Evan Floden
@evanfloden
Sep 27 2018 15:22
@cwytko Is it possible to check if here the groupTuple operator is waiting for all samples before emiting?
See the tip here
cwytko
@cwytko
Sep 27 2018 15:26
@evanfloden Ah so is that the groupTuple blocking?
Evan Floden
@evanfloden
Sep 27 2018 15:27
That would be my best bet
cwytko
@cwytko
Sep 27 2018 15:27
Ok, I'll tweak that part and see if there are any changes
Thank you!
micans
@micans
Sep 27 2018 15:28
@cwytko you can make groupTuple non-blocking if you can use it with groupKey. This is in the latest release.
cwytko
@cwytko
Sep 27 2018 15:29
@micans Oh neat, thank you again
micans
@micans
Sep 27 2018 15:30
:+1: see also nextflow-io/nextflow#796 (look at end of thread for Paolo's example)
Evan Floden
@evanfloden
Sep 27 2018 15:31
@micans I missed this issue thread somehow. Love the sketches!
micans
@micans
Sep 27 2018 15:32
hehe thanks!
cwytko
@cwytko
Sep 27 2018 15:45
btw @evanfloden how did you do the highlighting on the code with the link?
did you add that #L126-131 at the end of the URL manually?
Paolo Di Tommaso
@pditommaso
Sep 27 2018 16:26
check the green box in the groupTuple docs
Paolo Di Tommaso
@pditommaso
Sep 27 2018 16:34
@Bioninbo you should made the pattern variable not all the condition evaluation, ie
channel1.filter{ params.var || it[1] =~ /$your_var/  }.set{ channel1 }
Bioninbo
@Bioninbo
Sep 27 2018 16:49
I see, thanks! However, my aim was to make a long filter that I could reuse multiple times throughout my script. i.e. "it[1] = ='a' && it[2] ==2 && it[3] == 'b'". In the end, I changed the structure of my script. But I would still be curious to know if it is possible to do that. Maybe through a closure/function?
Paolo Di Tommaso
@pditommaso
Sep 27 2018 16:53
yes, but Eval.me is not needed, you can still define the condition as a clousure and reference as a variable
def your_condition = { it[1] == 'a' && it[2] ==2 && it[3] == 'b' }

channel1.filter(your_condition).set{ channel1 }
Bioninbo
@Bioninbo
Sep 27 2018 16:55
Awesome! Thanks @pditommaso !
Bioninbo
@Bioninbo
Sep 27 2018 20:34
@evanfloden just saw your message. Thanks for the help!
Tobias "Tobi" Schraink
@tobsecret
Sep 27 2018 23:09
This message was deleted
Tobias "Tobi" Schraink
@tobsecret
Sep 27 2018 23:55

I have a case where I am not entirely sure how to handle it.
My pipeline includes downloading ENA ids so each sample can have one or more lanes. Concretely that means each sample has one or more paired files like mentioned in "fromFilePairs".

files_produced_by_download_process  =  [
['id1_1.fq', 'id1_2.fq'], 
['id2_1.fq', 'id2_2.fq', 'id3_1.fq', 'id3_2.fq']
]  //imagine this was a channel and not an array.

I would like to process the file pairs individually but when that's done, I would like to end up with the alignments in individual arrays again.

aligned_reads  =  [
['id1.bam'],
['id2.bam', 'id3.bam']
]

My mockup of the process is below:

process align_reads_bwa {                                                                                                                
    publishDir 'alignments', mode:'symlink' 
    cpus 2                                                                                                                               
    time '4h'                                                                                                                           

    inputs:                                                                                                                              
    set file(reference), file(ref_index_files) from indexed_reference.first()                                                           
    set file(read1), file(read2) from Channel.fromFilePairs(files_produced_by_download_process) ???

    outputs:                                                                                                                         
    [file(alignment), file(index)] into alignments                                                        
    script:                                                                                                                  
    alignment = "${readfile1.baseName}.bam"
    index = "${readfile1.baseName}.bam.bai"                                            
    """                                                                                                                           
    bwa -mem $reference $read1 $read2 -t $task.cpus | samtools view -Sb | samtools sort - -  >  $alignment
    samtools index $alignment                                         
    """                                                                                                                          
}
To preempt questions: I want to merge the lanes when I get a VCF, don't want to merge in the same process as creating the BAMs because the alignment step takes lots of time, so if I can parallelize that, it translates to significant time savings for me.