These are chat archives for nextflow-io/nextflow

4th
Jul 2018
Maxime Garcia
@MaxUlysse
Jul 04 2018 07:07
Hey folks, Any French here at JOBIM2018?
Paolo Di Tommaso
@pditommaso
Jul 04 2018 07:25
@MaxUlysse if there are no french try with italians :joy:
Maxime Garcia
@MaxUlysse
Jul 04 2018 07:30
Any people from here at JOBIM2018 then ;-)
I was thinking that being a conference in France, (even if this year we only had english talks so far), there would be only French people
Paolo Di Tommaso
@pditommaso
Jul 04 2018 07:32
are there workflow related talks ?
Maxime Garcia
@MaxUlysse
Jul 04 2018 07:32
But I already manage to meet the guy behind the OAR executor support PR
Sarek of course ;-)
Paolo Di Tommaso
@pditommaso
Jul 04 2018 07:32
cool
I already manage to meet the guy behind the OAR executor support PR
I was thinking he was discouraged by my pedantic review comments :joy:
Maxime Garcia
@MaxUlysse
Jul 04 2018 07:36
I don't think so
Paolo Di Tommaso
@pditommaso
Jul 04 2018 07:37
good !
Shellfishgene
@Shellfishgene
Jul 04 2018 08:30
Question: In the RNASeq example on the webpage, why does file index from genome_index for the mapping step work? The channel should contain 6 bt2 files, why does this not get them from the channel one by one?
Evan Floden
@evanfloden
Jul 04 2018 08:36
That command stages the files in the work directory
I think then genome.index is required to specify the prefix.
Shellfishgene
@Shellfishgene
Jul 04 2018 09:02
Yes, what I don't understand is why it stages all 6 files, without using .collect() or something like that
Maxime Vallée
@valleem
Jul 04 2018 09:03
@Shellfishgene can you link the page?
The example works, I just don't know why
Maxime Vallée
@valleem
Jul 04 2018 09:04
it is in the step before
output:
    file 'genome.index*' into genome_index
in the channel genome_index, you will have all that is required
no need to explicitly declare all bt2 files
Shellfishgene
@Shellfishgene
Jul 04 2018 09:05
So that does not add 6 bt2 files to the channel separately?
Maxime Vallée
@valleem
Jul 04 2018 09:06
all the output starting with genome.index are put in the channel genome_index at the previous step
Shellfishgene
@Shellfishgene
Jul 04 2018 09:07
If I have file '*.fasta" into fasta_files the next process will get them one by one, no? If I need all the fasta files at once, I need to .collect() them, that's how I understand it...
Maxime Vallée
@valleem
Jul 04 2018 09:09
If I have file '*.fasta" into fasta_files the next process will get them one by one, no?
it depends on how you'll declare your input in the next process
Shellfishgene
@Shellfishgene
Jul 04 2018 09:12
So the next process uses file(fasta) from fasta_files, and it gets them one by one, no?
Another question, can I capture stderr to a channel? Bowtie outputs some statistics to stderr that I would like to keep.
Maxime Vallée
@valleem
Jul 04 2018 09:17
an easy way I like to quickly check a channel, is to comment out all the code, from the process you want to run to the end. then, on the channels I want to explore, I use println() operator. So try genome_index.println(). I suspect the channel has only one element consisting in a collection of all 6 bt2 files
Another question, can I capture stderr to a channel? Bowtie outputs some statistics to stderr that I would like to keep.
declare it as an output in a bowtie_stderr_ch on a newline
Shellfishgene
@Shellfishgene
Jul 04 2018 09:22

declare it as an output in a bowtie_stderr_ch on a newline

How do I do that? The documentation has a stdout special file, but not stderr

Maxime Vallée
@valleem
Jul 04 2018 09:23
bowtie ... 1> file.sam 2> bowtie.log should do the trick?
Shellfishgene
@Shellfishgene
Jul 04 2018 09:26
Oh, ok. I thought there was a nextflow method without writing it to a file in the command.
Thanks!
Maxime Vallée
@valleem
Jul 04 2018 09:28
it will end up in the .command.log file, deep in your work directory, but you are right to keep it in a channel for future use
Maxime Vallée
@valleem
Jul 04 2018 09:36
I have a question myself for the community. It is a bug I have been experiencing, and it summons a question : how NF decides if it caches or submits a process when resuming? I have a pipeline which scatter-gather data, and each time I re-run it for further development, it randomly re-launches a previously successful process, on different input files. If I kill the NF script and re-launch, he skips the files it wanted to relaunch for this process and wants to submit new ones (which were previously cached, and rightly so). Very rarely, all are cached (as it should be), but if I re-launch, I see the same behavious. I have compared the .command.sh generated (the successful first one and the re-submitted one) and they are exactly the same. So, how can I check why NF wants to re-launch a process that was actually fine?
Maxime Vallée
@valleem
Jul 04 2018 09:54
ok, during the 20 minutes, I found out this page : nextflow-io/nextflow#442
so I tried cache : 'deep', and re-launched. The script froze (my files are a little big), so I killed it. I removed the cache line, re-run it out of curiosity. Now it works flawlessly, and I am not able to reproduce the bug anymore.
Francesco Strozzi
@fstrozzi
Jul 04 2018 10:01
hi, naive question. Is there a way to make the content of a channel immutable ? If I want to use a Channel twice with a map function to generate two channels, how I can do that without duplicating the initial Channel ?
Luca Cozzuto
@lucacozzuto
Jul 04 2018 10:42
@fstrozzi I think it goes against the nextflow paradigma
where a channel has to be "consumed"
Francesco Strozzi
@fstrozzi
Jul 04 2018 10:45
Yes, I just wondered if there was a way to create an Immutable list from a channel that could be reused
Probably with some Groovy code maybe...
Luca Cozzuto
@lucacozzuto
Jul 04 2018 10:46
maybe not a channel but a groovy array...
Francesco Strozzi
@fstrozzi
Jul 04 2018 10:49
Yeah
Paolo Di Tommaso
@pditommaso
Jul 04 2018 10:55
X = yourChannel.collect()
Then use X
That's it
Luca Cozzuto
@lucacozzuto
Jul 04 2018 10:58
x-men-2-wallpaper-con-ian-mckellen-113215.jpg
always use X
Francesco Strozzi
@fstrozzi
Jul 04 2018 12:09

X = yourChannel.collect()

That simple, thanks ! :smile:

Shellfishgene
@Shellfishgene
Jul 04 2018 12:22
How exacly would I use shell commands with the when declaration? I want to check if a file has more than one line for example.
Paolo Di Tommaso
@pditommaso
Jul 04 2018 12:22
you can't use shell commands there
Shellfishgene
@Shellfishgene
Jul 04 2018 12:23
So I'd have to write a line counting groovy function?
Paolo Di Tommaso
@pditommaso
Jul 04 2018 12:25
or check if the tasks script
[[ `grep -c file.txt`==0 ]] && exit 0
and declare the output optional
Radoslaw Suchecki
@bioinforad_twitter
Jul 04 2018 13:27

Hi all, is there a way to conditionally repeat items in/from a channel so that the item (or its modified clone) could be emitted 1, 2 or more times? Here's a clunky way of doing that by subscribing to a channel and binding items to another:

X  = Channel.from([1,null],[2,["A","B"]],[3,["A","B","C"]]);
Y = Channel.create()
X.subscribe onNext: { 
  println "BEFORE: "+it 
    if(it[1]==null) {
      Y << it
    } else {
      for(i in it[1]) {
        clone = it.clone()
        clone[1]=i
        Y << clone
      }
    }
}, onComplete: { X.close(); Y.close() }
Y.subscribe { println "AFTER: "+ it}

output:

BEFORE: [1, null]
BEFORE: [2, [A, B]]
BEFORE: [3, [A, B, C]]
AFTER: [1, null]
AFTER: [2, A]
AFTER: [2, B]
AFTER: [3, A]
AFTER: [3, B]
AFTER: [3, C]

There must be a better way - some combination of operators and closures?

Paolo Di Tommaso
@pditommaso
Jul 04 2018 15:59
@bioinforad_twitter maybe this
 Channel
   .from([1,null],[2,["A","B"]],[3,["A","B","C"]])
   .transpose()
   .println()

[1, null]
[2, A]
[2, B]
[3, A]
[3, B]
[3, C]
Luca Cozzuto
@lucacozzuto
Jul 04 2018 16:06
Hello
I have a small problem with this
output:
    set "genome_index", file('genome_index*') into genomeIndexFiles
I got the following error:
ERROR ~ Error executing process > 'buildIndex (genome.fa.gz)'

Caused by:
  Missing output file(s) `genome_index` expected by process `buildIndex (genome.fa.gz)`
but files are there
genome_index.1.ebwt  genome_index.2.ebwt  genome_index.3.ebwt  genome_index.4.ebwt  genome_index.rev.1.ebwt  genome_index.rev.2.ebwt
Luca Cozzuto
@lucacozzuto
Jul 04 2018 16:16
solved in this way I hope:
set val("genome_index"), file("genome_index*") into genomeIndexFiles