These are chat archives for nextflow-io/nextflow

22nd
Mar 2017
Anthony Underwood
@aunderwo
Mar 22 2017 07:17
Hi. Still exploring use cases for Nextflow and have a couple of (hopefully) quick questions.
1) I know that there isn't the ability to import/include other modules. Looking at the examples in the awesome list I see there are some workflow files which are over 1000 loc long. This becomes quite cumbersome and hard to read. Do you have any recommendations on how to make the workflow process easy to understand?
2) I see the ability to make a .dot file of the DAG is only on execution is there a way to generate a diagram of all possible paths through the DAG for instance showing forking processes that can take two alternate paths?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 08:20
well, I would start from saying that with a NF a fully featured workflow can be written in single script file of 500-1000 lines on code
In other languages you may more lines of code for the definition of single command, see here
by using this approach you can easily read what exactly are all the commands in your pipeline and the interface for each of them in term of input/output without having to dig in several other files.
Paolo Di Tommaso
@pditommaso
Mar 22 2017 08:25
this is the main reason why modules weren't taken in consideration at the beginning
though the ability to support sub-workflows will be added at some point
said that you can still compact the main script length by using template or external script commands
Paolo Di Tommaso
@pditommaso
Mar 22 2017 08:31
also with NF you don't need a separate workflow process for each single command as it happens in other systems, NF processes are meant to exploit parallelism
thus if you have two or more sequential commands it's perfectly fine to group them in the same NF process.
Regarding the point 2, no it's not possible at this time
Paolo Di Tommaso
@pditommaso
Mar 22 2017 09:11
@aunderwo hope it helps
Anthony Underwood
@aunderwo
Mar 22 2017 09:59
Thanks @pditommaso - useful info. I can see template could be useful
Paolo Di Tommaso
@pditommaso
Mar 22 2017 09:59
nice
Tim Diels
@timdiels
Mar 22 2017 10:27
How do you run something (create database) before the first run of a process, but only if the process ever gets run, and something after its last run (delete database)?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:29
but only if the process ever gets run
what do you mean ?
Tim Diels
@timdiels
Mar 22 2017 10:30
When I'm resuming a run and the process has already completed all its work, so it's cached
So it doesn't have to run
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:32
for database you mean one or more files stored in the file system or a Sql/NoSql backend ?
Tim Diels
@timdiels
Mar 22 2017 10:36
not stored in the file system, they are decypher blast databases stored on special proprietary hardware
They may be stored as files, but I've no access to them
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:38
for this use case it could be used storeDir but I'm not sure it can be applied to your case
what I'm missing is what's the interface for that system
just files or a proprietary API / tool?
Tim Diels
@timdiels
Mar 22 2017 10:40
No, I don't think so. Decypher is a cluster, the created databases are stored on that cluster. The interface is a proprietary CLI
Something like dc_run -create dbname input.fasta
and then dc_run -query other.fasta -db dbname to blast other.fasta against dbname
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:43
so you need to run dc_run -create dbname input.fasta only the very first time
Tim Diels
@timdiels
Mar 22 2017 10:44
Yes
And after the last run I need to run dc_run -delete dbname
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:44
does it provide any command to check if that database already exist?
Tim Diels
@timdiels
Mar 22 2017 10:45
Yes
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:45
if so you could have as first process a command that create that database only if does not exist
Tim Diels
@timdiels
Mar 22 2017 10:46
You can call it dc_run -exists dbname (I don't remember the commands exactly, shouldn't matter though)
Right, I could simply prefix the process with it. But what if there's a race condition?
Or that's why you suggest a process in front of the other?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:47
yes this should be a process on which all the other depends
Tim Diels
@timdiels
Mar 22 2017 10:47
input -> create_if_not_exists -> do_work -> delete_if_last
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:47
yes
Tim Diels
@timdiels
Mar 22 2017 10:48
Yeah I think that could work, thanks
Paolo Di Tommaso
@pditommaso
Mar 22 2017 10:49
the delete can also implemented with the completion handler
see this examples that send an email on completion
Tim Diels
@timdiels
Mar 22 2017 11:08
How is executor.$local.cpus enforced? Asking because of processes like do_work which will be idly waiting for Decypher to finish.
Paolo Di Tommaso
@pditommaso
Mar 22 2017 11:10
I see, well that process will consume in any case a cpu ..
Tim Diels
@timdiels
Mar 22 2017 11:10
Right, for each job. Can I set cpus 0?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 11:11
I think it should do the trick
Tim Diels
@timdiels
Mar 22 2017 11:16
Alright fingers crossed for no divisions by zero
Paolo Di Tommaso
@pditommaso
Mar 22 2017 11:17
ahah
there should not be !
Tim Diels
@timdiels
Mar 22 2017 11:24
Great
Paolo Di Tommaso
@pditommaso
Mar 22 2017 11:24
:v:
Tim Diels
@timdiels
Mar 22 2017 13:26
Is there a way of running https://gitlab.psb.ugent.be/deep_genome/pipeline without changing .nextflow/scm and without giving them a gitlab account (a private token is apparently required)?
$ nextflow run -latest -name test -hub psbgitlab deep_genome/pipeline -r nextflow
Picked up JAVA_TOOL_OPTIONS: -XX:ParallelGCThreads=1
N E X T F L O W  ~  version 0.23.4
Pulling deep_genome/pipeline ...
Not a valid Nextflow project -- The repository `https://gitlab.psb.ugent.be/deep_genome/pipeline` must contain a the script `main.nf` or the file `nextflow.config`
This check appears to ignore the -r nextflow argument, there is a main.nf file at the root of that branch, but not at master
Paolo Di Tommaso
@pditommaso
Mar 22 2017 14:50
um being public it should be possible
I'm checking what's wrong
Paolo Di Tommaso
@pditommaso
Mar 22 2017 14:57
@timdiels is that supposed to be a NF pipeline ?
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:00
I think the NF pipeline is on the nextflow branch on the repo
I'm guessing that's why they're using -r nextflow
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:01
I see, in any case the nextflow.config must exist in the project root on the main branch otherwise won't work
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:07
@MaxUlysse a short suggestion
this snippet
ch_normalize_vcf = Channel.create()

if ( 'normalize' in workflowSteps ) {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_normalize_vcf }
}
else {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_annotate }

    // So we don't get stuck in an infinite loop
    ch_normalize_vcf.close()
}
can be refactored as
if ( 'normalize' in workflowSteps ) {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_normalize_vcf }
}
else {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_annotate }

    // So we don't get stuck in an infinite loop
    Channel.empty().set { ch_normalize_vcf }
}
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:11
Thanks a lot, I'll tell them, but I just forked this project to make a small PR, they were trying to get the git revision, I suggested to use the workflow.scriptId instead
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:12
I saying that, because I saw in CAW there are a lot of Channel.create() there are actually not needed
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:12
I almost removed them all
We did some refactoring
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:13
great
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:13
Someone has a new interest in the project, so we're having a new set of eyes to help us ;-)
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:14
nice :)
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:27
@MaxUlysse one more tip
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:27
I always welcome tips
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:27
I would suggest to use local variable in the functions
for example here
could be
def checkReferenceMap(referenceMap) {
  // Loop through all the references files to check their existence
  def referenceDefined = true
  referenceMap.each{
    referenceFile, fileToCheck ->
    def test = checkRefExistence(referenceFile, fileToCheck)
    !(test) ? referenceDefined = false : ''
  }
  return referenceDefined
}
or even
def checkReferenceMap(referenceMap) {
  // Loop through all the references files to check their existence
  final referenceDefined = true
  referenceMap.each{
    referenceFile, fileToCheck ->
    final test = checkRefExistence(referenceFile, fileToCheck)
    !(test) ? referenceDefined = false : ''
  }
  return referenceDefined
}
but using def/final avoid to set the variable in the global context avoiding possible conflicts
Tim Diels
@timdiels
Mar 22 2017 15:30
@pditommaso I'm converting an existing pipeline to nextflow in the nextflow branch, will merge to master when done
Maxime Garcia
@MaxUlysse
Mar 22 2017 15:30
Ok, thanks a lot, I'll correct it right away, we do want to avoid collisions, conflicts and such
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:32
@timdiels I see, if so when you will merge it, you will be able to pull from the repo ..
Tim Diels
@timdiels
Mar 22 2017 15:41
@pditommaso ok, thanks
If I add an empty nextflow.config to master, would it be able to pull the nextflow branch then?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 15:43
nope
I think you will need at least to specify the default branch
Mike Smoot
@mes5k
Mar 22 2017 20:21
Hi @pditommaso a few of my users have updated to version 0.24.0 and are now getting this error ERROR ~ A DataflowVariable can only be assigned once. Only re-assignments to an equal value are allowed. on pipelines that worked with version 0.23.4. We're not having any luck tracking down what might be triggering this. Any ideas?
Paolo Di Tommaso
@pditommaso
Mar 22 2017 20:22
do you have the stack trace?
please open an issue and upload it there
Mike Smoot
@mes5k
Mar 22 2017 20:24
Will do.
Mike Smoot
@mes5k
Mar 22 2017 20:30
Here you go: nextflow-io/nextflow#308