These are chat archives for nextflow-io/nextflow

6th
Jun 2018
Paolo Di Tommaso
@pditommaso
Jun 06 2018 07:16
@davidmasp rm -rf work is the easiest way nextflow clean give you more control on which run you want to delete
David Mas-Ponte
@davidmasp
Jun 06 2018 08:39
okay, thx
Will Rowe
@will-rowe
Jun 06 2018 13:29

Hi. I'm trying to move my nextflow pipeline over to use conda for the dependencies. However, I keep receiving a 'Text file busy' error when I launch it using LSF. I have also tried a simple qc pipeline with a single conda env and still receive the following:

Command error:
  /usr/bin/env: Rscript: Text file busy

Sorry if this is a problem with my setup / HPC, I can't work it out.

micans
@micans
Jun 06 2018 13:33
I only know this error as a UNIX error, where indeed the text file or executable would be busy. In this case I'm not sure whether that's env or Rscript, probably the latter. It would mean the file system is still busy with that executable. I can't exclude other possibilities (not a deeply experienced UNIX hacker myself), but that has definitely been my experience.
Paolo Di Tommaso
@pditommaso
Jun 06 2018 13:40
@will-rowe frankly I've never seen it. I would suggest to replicate the error just executing the task launcher using bash .command.run and ask to your sysadmins
Will Rowe
@will-rowe
Jun 06 2018 13:42
okay - thanks both for your help!
micans
@micans
Jun 06 2018 14:49
I considered breaking up our pipeline into two logical parts (1) aligner index creation (2) rnaseq processing -- they are currently in the same main.nf. They would still share a lot of config, so I thought perhaps split into two files buildindex.nf and main.nf. However, this seems to go against NF philosophy (one project, one repository, one pipeline, one main.nf). Is that correct? Are the options to either have two repositories, or to have everything in a single main.nf?
Luca Cozzuto
@lucacozzuto
Jun 06 2018 14:51
@micans why this?
micans
@micans
Jun 06 2018 14:51
Why what?
Luca Cozzuto
@lucacozzuto
Jun 06 2018 14:52
:) why splitting the pipeline
micans
@micans
Jun 06 2018 14:53
I don't see it as splitting; I view it as two pipelines currently in the same script. One provides a resource for the other, but it is never run as a single pipeline.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 14:55
So for you they are two pipelines, it make sanse to have them in two different repos. However building the index is just a single process, I guess... or you have more?
micans
@micans
Jun 06 2018 14:57
(btw, I am fairly agnostic, perhaps my idea is not good. but it's the idea I have right now).
The index part builds indexes for three different aligners. But there is some general admin-config that is shared between this part and the rnaseq part.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 14:58
ok so you want to make 3 different indexes starting from a sequence
store them and use them when they are needed
micans
@micans
Jun 06 2018 14:59
Not sure about sequence. The indexes are for genomes.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 14:59
by the second pipeline
micans
@micans
Jun 06 2018 15:00
yes -- I need to make genome indexes rarely. These indexes are used by the aligners in the rnaseq pipeline.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 15:00
I don't know in your environment but we found than recreating the indexes is cheaper than storing them
so we create them, use them and delete them afterwards. Storing only the compressed genome/transcriptome
but this depend on the environment
cloud etc
micans
@micans
Jun 06 2018 15:02
Interesting, and unexpected.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 15:03
moreover using containers allows you to have the same index (so you don't risk strange behaviours)
micans
@micans
Jun 06 2018 15:03
In our environment storing is cheap. I don't know about the run-time cost to creating them -- fairly new to this job. But it's a point I had not considered.
I'm still interested in the original question; file-level modularisation seems to give some scope for organisation.
Luca Cozzuto
@lucacozzuto
Jun 06 2018 15:04
you are a lucky man! :) We have always problem with disk space. However consider than creating the index on the fly is relatively fast and you can do also other stuff in the meanwhile (like controlling the quality of raw reads, trimming them etc...)
Francesco Strozzi
@fstrozzi
Jun 06 2018 16:24
hi guys, I was wondering if you also experience some problem with the position of the when directive in the process. I have just saw that the when directive is ignored if it is not right before the script section. Also, if the publishDir directive is just above the when directive, the publishDir is not executed.
Paolo Di Tommaso
@pditommaso
Jun 06 2018 16:26
directives should be before any input/output/when: block
Paolo Di Tommaso
@pditommaso
Jun 06 2018 16:41
provided that, it should work independently the order
Francesco Strozzi
@fstrozzi
Jun 06 2018 17:31
:+1: ok thanks