These are chat archives for nextflow-io/nextflow

19th
Dec 2018
Oliver Schwengers
@oschwengers
Dec 19 2018 09:25
Hi Paolo, I have a question regarding the chaching logic. I have a simple tsv file which I filter (rows and columns) and then pass on the items as a value set to a process which merely donwloads some files via wget. Unfortunately, the caching doesn't work and everytime I resume the pipeline the downloading processes start from scratch. I tested it for normal chaching and lenient. Any ideas / advices? Do I something wrong? thx !

Channel.fromPath( assembledGenomesPath )
.splitCsv( skip: 2, sep: '\t' )
.filter( { .... } )
.map( { [ it[0], it[19] ] } )
.into { chValidGenomes, ... }

process download {
executor 'local'
maxForks 3
maxRetries 3
errorStrategy 'retry'
cache 'lenient'
publishDir pattern: '*.fna', mode: 'copy'

input:
set val(acc), val(path) from chValidGenomes

output:
set val(acc), file( "${acc}.refseq.fna" ) into chReadAssembly

script:
"""
wget -O ${acc}.refseq.fna.gz ${ncbiPath}/${path}/${path.split('/').last()}_genomic.fna.gz
gunzip ${acc}.refseq.fna.gz
"""

}

Paolo Di Tommaso
@pditommaso
Dec 19 2018 13:30
the download process is not cached? at a first look it looks OK
are you able to isolate your problem in a small test case ?
Oliver Schwengers
@oschwengers
Dec 19 2018 13:32
let me see... hang on
Oliver Schwengers
@oschwengers
Dec 19 2018 13:37
strange.. I isolated a smaller test-case from the example above and this time it behaves as expected
Paolo Di Tommaso
@pditommaso
Dec 19 2018 13:38
So we found that the problem is somewhere else..
Oliver Schwengers
@oschwengers
Dec 19 2018 13:40
obviously.... I'll eloborate on it and came back if I find out something interesting...
Paolo Di Tommaso
@pditommaso
Dec 19 2018 19:57
Stephen Ficklin
@spficklin
Dec 19 2018 23:33
Hello. I have what I hope is a quick question. I'm trying to work on a cleanup process that removes unwanted files. But I can't remove the files in an afterScript section because the files need to be removed after other processes have worked on them. My approach is to create a cleanup process that receives two input channels. The first channel is from process X that provides a tuple: (index, file). The second channel is from process Y that indicates it has finished with the file and simply contains the index value. After reading the nextflow documentation I see that for processes with multiple inputs that the process will use the elements in the channels as they arrive. As an example it uses some channels with hard-coded values. My question is... when the channels are queues will Nextflow ensure that the channels are filled in any order? Or do I need to have some sort of groupTuple to ensure the channel output gets to my cleanup process in the order I want?
Perhaps that's not very clear....
Rad Suchecki
@rsuchecki
Dec 19 2018 23:54
Some thoughts @spficklin
  1. Are you so pressed for space to have to do that?
  2. Not sure, but I think this may break caching so even with -resume processes will get re-executed?
  3. If really necessary to cleanup, could process Y delete the file when no longer needed? Alternatively can you combine processes X and Y and simply delete the intermediary file there?
  4. A minimal working example usually helps.