These are chat archives for nextflow-io/nextflow

18th
Jul 2017
Tobias Neumann
@t-neumann
Jul 18 2017 13:11
hi! really into nextflow now - great system! I couldn't find it in the documentation - is there a way to get the task ID from a process that runs in parallel?
I would like to write that along with the file name that was handled into a text file for reference, once the workflow has finished
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:13
Thanks
the trace file won't work for your use case ?
Tobias Neumann
@t-neumann
Jul 18 2017 13:15
no in my case I want to also submit the task ID to a command that is run in that process, so I'd need it in the process already. Otherwise the trace file would probably do if I use the file name as tag I guess
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:19
I see, the task (hash) ID is not available but you can still access the task (unique) index
by using task.index
Tobias Neumann
@t-neumann
Jul 18 2017 13:19
awesome - it's an integer unique within a workflow run I assume?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:21
ah, good question
let me check
Tobias Neumann
@t-neumann
Jul 18 2017 13:22
Let me put it easier: It corresponds to the first column of the trace file?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:24
nope
it's the unique index within the same task
Tobias Neumann
@t-neumann
Jul 18 2017 13:25
ah ok - so it's not possible to get that? I thought the hash ID is the second column
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:25
for example a process spawn 10 tasks, the index it will 1..10
unfortunately no
Tobias Neumann
@t-neumann
Jul 18 2017 13:28
ok I can still use that but then I need to write it to a separate file and acnnot use the trace file then.
what's the best way to write to a collective file for all tasks - so one line per task with task.id\tfilename?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:29
I still have the feeling you are trying to re-implement something already done by NF
you can get create your own execution log with the nextflow log command
check nextflow log -h
nextflow log -f task_id,script last
you can even specify a template file to customise your report
Tobias Neumann
@t-neumann
Jul 18 2017 13:34
I usually would agree with you if I used my own stuff. We have this rather inconvenient alignment system that does not keep track of the used files but demands you give it your own sample ID (integers).
I figured since the sample ID is random anyways, I could just use whatever task ID is given by nextflow and log the task ID -> sample association in a textfile to keep track
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:37
yes, task id is not guaranteed to be unique across executions, only the hash id is consistent
Tobias Neumann
@t-neumann
Jul 18 2017 13:38
ok good - I get your point
on another note: can you use negations in regexes for GLOBs?
I want to get all paired end read files ending with _1 and _2 which is straightforward
all my SE files will be then everything that is NOT _1 and _2 - that's where I fail
PE: _{1,2}.fastq.gz
SE: *[!_][!12].fastq.gz ?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:41
uh, I don't think glob support negation
does that work in bash ?
Tobias Neumann
@t-neumann
Jul 18 2017 13:41
if I leave the [!_] out, it gives me only files ending with anything but "1" or "2" - so there the negation seems to work
but not when I also check for the second last character to be not "_"
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:42
doesn't seem it's reported here
you may try
Tobias Neumann
@t-neumann
Jul 18 2017 13:43
yeah no I checked it's not there - ok then I'll play around myself a bit more. thanks for your help and again - GREAT software!
Paolo Di Tommaso
@pditommaso
Jul 18 2017 13:43
otherwise you should be able to chain a filter operator
THANKS! :)
Tobias Neumann
@t-neumann
Jul 18 2017 14:00
sorry one more thing: from the fromFilePairs channel - how do I do if I need to have the files individually in the command - like --readPair1 ${reads[0]} --readPair2 ${reads[1]}
Paolo Di Tommaso
@pditommaso
Jul 18 2017 14:01
exactly like that, provided reads is declared as the input
Tobias Neumann
@t-neumann
Jul 18 2017 14:01
live can be so easy ;) thanks
Paolo Di Tommaso
@pditommaso
Jul 18 2017 14:02
ahah, yes I think why a lot people prefer other solutions ;)
mitul-patel
@mitul-patel
Jul 18 2017 14:40
Hi All, I have question regarding using Channels. Step1: trimming Step2:mapping. If trimming is on then output of trimming process used as an input for mapping step. But if trimming is off then reads channel used as an input for mapping step.. How do i tell mapping process which input channel to use?
Félix C. Morency
@fmorency
Jul 18 2017 14:42
PseudoCode
if(trimming) {
  in_mapping = trimming.out
} else {
  in_mapping = mapping.in
}
Would that work?
mitul-patel
@mitul-patel
Jul 18 2017 14:45
I thought about updating channels..like updating variable with the new values.....but i dont know if its possible in nextflow....
Tobias Neumann
@t-neumann
Jul 18 2017 14:49
is it possible to join two channels in one process? i have a channel with SE reads and PE reads I want to handle simultaneously in a process with an if clause
both read as "fromFilePairs" like this
pairFiles = Channel.fromFilePairs(pairedEndRegex)
singleFiles = Channel.fromFilePairs(SERegex, size: 1){ file -> file.baseName.replaceAll(/.fastq/,"") }
Maxime Garcia
@MaxUlysse
Jul 18 2017 14:58
@t-neumann You should be able to do something even smarter
we were greatly inspired by other scripts in our pipeline: https://github.com/SciLifeLab/CAW/blob/master/main.nf#L208-L218
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:06
as long as both channels have the same structure you can do a much easier thing
Maxime Garcia
@MaxUlysse
Jul 18 2017 15:07
I'm interested too ;-)
Oh but yes, the script wasn't about that at all
I misunderstood
sorry
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:08
oops, my keyboard refuse to type the backtick .. :/
reads_ch  = pairedEndRegex ? Channel.fromFilePairs(pairedEndRegex) : Channel.fromFilePairs(SERegex, size: 1){ file -> file.baseName.replaceAll(/.fastq/,"") }
` is back !
(or something like that)
Tobias Neumann
@t-neumann
Jul 18 2017 15:09
awesome! I'll try that right away
Tobias Neumann
@t-neumann
Jul 18 2017 15:15
ah ok - what that does is to check if you find PE files and if yes create a PE channel and if not, create a SE channel right?
but what if I have BOTH SE and PE files in a directory?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:16
if should be fine as long you have a glob pattern ables to do not mix them
@mitul-patel something like that should wokr
if ( trim_on  ) {
  trim_reads_ch = <your reads channel>
  mapping_reads_ch = Channel.emtpy()
}

else {
  trim_ch = Channel.empty()
  mapping_reads_ch = <your reads channel>
}


process trimming {
  input:
  set sample_id, file(reads) from trim_reads_ch
  output:
  set sample_id, file(<your outut>) into trim_out_ch 
}

process mapping {
  input:
  set sample_id, file(reads) from trim_out_ch.mix(mapping_reads_ch)   

  : 
}
Tobias Neumann
@t-neumann
Jul 18 2017 15:18
@pditommaso if I do it like you propose, it only starts up tasks for the PE data
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:18
well you should have a condition to switch between PE or SE, no?
Tobias Neumann
@t-neumann
Jul 18 2017 15:19
if (reads.size() > 1)

    """
    echo "PETask"
    """

else 

    """
    echo "SETask"
    """
I do like this
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:20
ah, you mean at the process level ?
Tobias Neumann
@t-neumann
Jul 18 2017 15:20
exactly
otherwise it will run first all SE processes and then all PE processes - I want it done at once
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:20
have a look here
Tobias Neumann
@t-neumann
Jul 18 2017 15:24
hm - I'm doing exactly what they do. what's the easiest way to see what is in the reads_ch channel?
pairedEndRegex = params.readDir + "/*_{1,2}.fastq.gz"
SERegex = params.readDir + "/*[!12].fastq.gz"

reads_ch  = pairedEndRegex ? Channel.fromFilePairs(pairedEndRegex) : Channel.fromFilePairs(SERegex, size: 1){ file -> file.baseName.replaceAll(/.fastq/,"") }
Paolo Di Tommaso
@pditommaso
Jul 18 2017 15:27
(sorry need to leave)
Tobias Neumann
@t-neumann
Jul 18 2017 15:27
to be continued ;)
I still think the reads_ch statement just tests whether the regex-string is null and if not, creates a pairedend channel, otherwise a single-end channel. no merged channel of both SE and PE files
it's also what the log tells me
Jul-18 17:29:33.132 [main] DEBUG nextflow.Channel - files for syntax: glob; folder: ../raw/results/; pattern: *_{1,2}.fastq.gz; options: null
Félix C. Morency
@fmorency
Jul 18 2017 17:32
is there a way to pass a list argument on the command-line?
Paolo Di Tommaso
@pditommaso
Jul 18 2017 21:44
pass a comma separated string
then, split it
foo = params.foo?.tokenize(',')
mahdi-b
@mahdi-b
Jul 18 2017 22:46
Hi, I have a question regarding the DataflowQueue and Channels. What is the difference between both? How can one apply Channel operations (ex. Merge) on two DataflowQueues
Paolo Di Tommaso
@pditommaso
Jul 18 2017 22:49
DataflowQueue and DataflowVariable are the concrete data structures which implement NF channels under the hood
mahdi-b
@mahdi-b
Jul 18 2017 22:50

Why is it that we cannot use Channel operations with DataflowQueue. For exmaple
`
output:

file("*cor_unpaired_1.fq") into paired_1_fastq
file("*cor_unpaired_2.fq") into paired_2_fastq


paired_1_ch = paired_1_fastq.flatMap()
paired_2_ch = paired_2_fastq.flatMap()

`

This message was deleted
this generates an error.
Paolo Di Tommaso
@pditommaso
Jul 18 2017 22:51
what error ?
mahdi-b
@mahdi-b
Jul 18 2017 22:52
ERROR ~ No signature of method: groovyx.gpars.dataflow.DataflowQueue.merge() is applicable for argument types: (groovyx.gpars.dataflow.DataflowQueue) values: [DataflowQueue(queue=[])]
Paolo Di Tommaso
@pditommaso
Jul 18 2017 22:53
merge requires a closure that specify how to merge those channels
(better now ;))
mahdi-b
@mahdi-b
Jul 18 2017 22:56
Oups! :(
I am going sleep now -- although it's just 1PM here.
By the Paolo, do you ever sleep? :)
Paolo Di Tommaso
@pditommaso
Jul 18 2017 22:56
I should, it 1 AM here ;)
where are you ?
mahdi-b
@mahdi-b
Jul 18 2017 22:57
Honolulu, HI.
HI = hawaii
Paolo Di Tommaso
@pditommaso
Jul 18 2017 22:57
wow cool!
tho also barcelona is not so bad :)
mahdi-b
@mahdi-b
Jul 18 2017 23:00
Oh, I am familiar with the CRG. Last time I was there was for the the last RECOMB and will most likely be there for the next one (RECOMB-CG) in October
Best food in the world! :)
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:00
nice! what's your org ?
mahdi-b
@mahdi-b
Jul 18 2017 23:00
University of Hawaii
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:01
I see, I'm very happy NF spread until Hawaii .. :)
mahdi-b
@mahdi-b
Jul 18 2017 23:02
I am including it in a course next summer. I have to say that it's amongst my top 5 software ever! :) no kidding
I meant next semester not summer
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:03
Ok, it sounds great. Thanks for your kind words
well, can I ask your name?
:)
mahdi-b
@mahdi-b
Jul 18 2017 23:05
Mahdi Belcaid
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:05
very pleased to know you, Mahdi
are you planning to go to ISCM conference to Prague this weekend ?
mahdi-b
@mahdi-b
Jul 18 2017 23:06
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:06
just in case ..
mahdi-b
@mahdi-b
Jul 18 2017 23:06
Unfortunately, I am not. but I will most likely see you at recomb CG
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:07
wow! fantastic place!
mahdi-b
@mahdi-b
Jul 18 2017 23:07
Nice to meet you as well, Paolo!
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:08
do you need a research engineer? I have good referees :)
(kidding)
mahdi-b
@mahdi-b
Jul 18 2017 23:09
LOL. Yeah, I would switch with you anytime. I love Barcelona.
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:09
ahah
nobody is never happy
mahdi-b
@mahdi-b
Jul 18 2017 23:09
LOL Very well said! :)
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:18
:wave: :wave:
mahdi-b
@mahdi-b
Jul 18 2017 23:21
Ciao!
Paolo Di Tommaso
@pditommaso
Jul 18 2017 23:25
aloha ;)