These are chat archives for nextflow-io/nextflow

14th
Jan 2019
nicweb
@nicweb
Jan 14 11:02

Hello!
I'm running multiple nextflow instances from within the same working directory in parallel (to optimize cluster load). I thought that this should be possible because each instance will get a unique sessionid.
Unfortunately some instances fail to start because they can not acquire a DB lock. nextflow log shows different nextflow calls (samples) getting the same sessionid.

Am I doing something wrong, or why are getting different calls the same sessionid?
Whats the best way to run multiple instances of nextflow on the same time?

micans
@micans
Jan 14 11:03
are you giving them different work directories? (-w option)
Luca Cozzuto
@lucacozzuto
Jan 14 11:04
I would run in different folder each project
micans
@micans
Jan 14 11:04
Yes, that makes sense
Luca Cozzuto
@lucacozzuto
Jan 14 11:05
with a symbolic link to the nextflow pipe (and the other files needed)
nicweb
@nicweb
Jan 14 11:06
yes every call has it's own work and outdir
Luca Cozzuto
@lucacozzuto
Jan 14 11:08
I think that if you run multiple instance of a nextflow execution in a folder it makes a DB lock and prevent other ones to be executed
micans
@micans
Jan 14 11:10
There's also the .nextflow directory.
nicweb
@nicweb
Jan 14 11:10
For me 4 out of 38 instances ended up with a same sessionid and therefore failed to run. The other 34 samples compute find
yes the .nextflow directory contains a subfolder with the sessionid and the DB
I'm kind of worried that two runs have the same sessionid (which should be uniquie according to the documentation).
Jordi Deu-Pons
@jordeu
Jan 14 11:23
Hi! One question, is it possible to use system environment variables in a config file?
nicweb
@nicweb
Jan 14 11:27
Anyway thanks for the input! I'll go for different subfolders for each instance in the future.
Jordi Deu-Pons
@jordeu
Jan 14 11:40
Ey, sorry, I answer myself. Just like this varname="${MY_ENV_VAR}" works. Thanks.
Johannes Alneberg
@alneberg
Jan 14 13:25
Hello! I was looking for a way to cleanout the scratch dir after the process has finished. So I found this: nextflow-io/nextflow#230, but I can't find how this would be specified in the workflow. Is this replaced by afterScript?
KochTobi
@KochTobi
Jan 14 13:30
@alneberg I think they solved it using the scratch true directive
Johannes Alneberg
@alneberg
Jan 14 13:32
Hmm, but that one doesn't mention when the tempdir is removed. From my current Sarek runs, it doesn't seem to be cleaned out at all if not by the external system.
Johannes Alneberg
@alneberg
Jan 14 13:49
Thanks @pditommaso, I found that in the code as well. Just don't know how to specify cleanup=true which my limited groovy knowledge tells me I need to. Or is scratch=true sufficient?
Paolo Di Tommaso
@pditommaso
Jan 14 13:50
it's a process directive, therefore process.scratch = true in the config file
Johannes Alneberg
@alneberg
Jan 14 13:52
And that would set cleanup = true as well? Sorry for being slow here
Paolo Di Tommaso
@pditommaso
Jan 14 13:54
maybe I'm the slow one .. :)
from where cleanup = true come from ?
Johannes Alneberg
@alneberg
Jan 14 13:55
The first line you linked to: if( scratch && cleanup != false ) {
I interpret that line in this way: if (scratch AND cleanup == true) {, am I reading that wrong?
Stephen Kelly
@stevekm
Jan 14 14:00

@nicweb

running multiple nextflow instances from within the same working directory in parallel (to optimize cluster load)

Just wondering, how does this optimize cluster load? Not sure I understand the advantage over just having the main pipeline instance handle it all?

Paolo Di Tommaso
@pditommaso
Jan 14 14:01
@alneberg yes, because by default it's null therefore it's only need that scratch is set
basically it was mean only to disable it
Stephen Kelly
@stevekm
Jan 14 14:02
@alneberg @pditommaso I thought that Nextflow automatically removes everything it places in the scratch dir after execution completes, right?
However would this only be limited to items that were staged in? Or does it apply to the entire scratch dir?
Paolo Di Tommaso
@pditommaso
Jan 14 14:02
only when process.scratch =true
@alneberg also between the other things cleanup is undocumented therefore ignore (does not make much sense, I may remove it)
Johannes Alneberg
@alneberg
Jan 14 14:08
It all makes sense now. Thanks for your explanation! Turns out, the basic problem is that I am not using scratch=true at all for the profile I am using. sigh
Paolo Di Tommaso
@pditommaso
Jan 14 14:09
great
hydriniumh2
@hydriniumh2
Jan 14 14:15
Hi, I couldn't find this stated explicitly anywhere regarding using aws queue as the nextflow executor but does nextflow automatically spin up additional instances to handle parallel processes or does it only scale when required cpus/memory change?
KochTobi
@KochTobi
Jan 14 14:15
the compute environment takes care of the scaling when using awsbatch
hydriniumh2
@hydriniumh2
Jan 14 14:16
So if I was aligning multiple samples in parallel it would create the required number of instances?
KochTobi
@KochTobi
Jan 14 14:19
yes the compute environment attached to your queue will create spot requests.
hydriniumh2
@hydriniumh2
Jan 14 14:23
Thanks!
Luca Cozzuto
@lucacozzuto
Jan 14 15:55
Dear all, is there a way to run a script in case there is an error in a process execution?
micans
@micans
Jan 14 15:56
Interesting ... you could try bash trap in the script section itself?
Luca Cozzuto
@lucacozzuto
Jan 14 15:58
never heard about it... how to use it?
I'm interested in case an execution fails I want to create an empty file
micans
@micans
Jan 14 15:59
you can do that with
if ! mycmnd my arguments etc etc; then
   > emptyfile
fi
you can still then control exit status
Luca Cozzuto
@lucacozzuto
Jan 14 16:02
how?
micans
@micans
Jan 14 16:02
exit 1
Luca Cozzuto
@lucacozzuto
Jan 14 16:02
aha
let me try
and many thanks for your help
micans
@micans
Jan 14 16:02
:+1:
(remember trap, it's really cool and useful)
Stephen Kelly
@stevekm
Jan 14 16:41

@lucacozzuto

I'm interested in case an execution fails I want to create an empty file

I do that like this:

grep -E 'CALLABLE|PASS' "${output_bed}" > "${output_bed_pass}" || touch "${output_bed_pass}" # exit code 1 if no matches

Replace 'grep' with your command. If the command's exit status is non-zero, the part after the || gets executed, and touches an empty file of the same name instead

using bash trap is an interesting idea, in fact I believe that its an integral part of the Nextflow .command.run scripts
however I still have trouble wrapping my head around them
Luca Cozzuto
@lucacozzuto
Jan 14 16:43
thanks a lot! I tried the solution given to me by @micans and it worked nicely
nicweb
@nicweb
Jan 14 17:53
@stevekm I don't want the (existing) pipeline to make inter-sample comparisons I use one nextflow run per sample and do not specify all samples in one call. Computing all samples serial I would need to wait until each sample is finished for the next one to get processed
micans
@micans
Jan 14 18:24
@stevekm that's a nice idiom too, agree. The if statement is good if you want to do more things, e.g. exit. You could still do that e.g. false || (touch foobarzut && exit 1), but it becomes less readable.
Stephen Kelly
@stevekm
Jan 14 19:23
You can set up Nextflow to read in the list of samples and run the tasks for all of them in parallel. Nextflow does not execute in serial unless you got out of your way to force it to.
Tobias "Tobi" Schraink
@tobsecret
Jan 14 20:56
As Stephen mentioned, you can use something like the following to create an input channel with accessions called sample_accessions
params.AccessionsFile = "${baseDir}/accessions.tsv"
sample_accessions = Channel                                                                                                                               
                  .fromPath(params.AccessionsFile)                                                                                                            
                  .splitCsv(header: true, sep: '\t', strip: true)                                                                                             
                  .map {row -> row.acc}
For testing purposes I like to add .take(10) (to only use the first 10 samples) at the end which I comment when I am done testing.