These are chat archives for nextflow-io/nextflow

18th
Sep 2017
Ashley S Doane
@DoaneAS
Sep 18 2017 04:40
can someone advise on a basic question: I would like to set 12 env variables that depend on the reference genome I call as a parameter. In bash I would simply use if statements. Currently my nexflow.config uses the env scope to set all the variables to hg38 references, but I want to set these to mm10 when needed. I could make a different profile for each genome and includeConfig, or is there a better way? thanks :)
Phil Ewels
@ewels
Sep 18 2017 06:10
Hi @DoaneAS - we came across the same thing and solve it like this: https://www.slideshare.net/tallphil/standardising-swedish-genomics-analyses-using-nextflow/20
(slide 20 in case the link doesn't do the right think - talk from the Nextflow meeting last week so great timing!)
You can see it in action here and here in one of our pipelines..
Laurent Modolo
@l-modolo
Sep 18 2017 08:59

Hi everyone,
I am trying to use nextflow on a sge cluster but I run on the following error:

Command executed:

  bash
   source /usr/share/modules/init/bash
   module use /applis/PSMN/Modules
   module load Base/psmn
   module load use.own
   umask 002
   ls .

         module load pigz/2.3.4
         pigz -p 4 -c  2017_04_14_PLBD11_wt_R3.fastq > 2017_04_14_PLBD11_wt_R3.fastq.gz
         module load file_handle/0.1.1
         file_handle.py -c -e -f 2017_04_14_PLBD11_wt_R3.fastq.gz
  ls -l

Command exit status:
  1

Command output:
  (empty)

Command error:
  /bin/bash: .command.sh: No such file or directory

Work dir:
  /scratch/cburny/readthroughpombe/work/86/8933614ddd2b675bb6e619f018fb43

There is no .command.sh in the work dir. The script work flawlessly on docker or on a local computer and I don’t understand this error.

Thank you!

Anthony Underwood
@aunderwo
Sep 18 2017 09:02

HI @pditommaso I got AWS to work in the end but there were a few gotchas:

1) Don't create S3 buckets in eu-west-2 (London), the data couldn't be accessed. Fine when created in eu-west-1
2) my publishDir was specified as publishDir ${output_dir}/summary , mode: 'copy'. This didn't work since output_dir is specified as s3://my-bucket/outputs and it doesn't appear that the subdir summary can be created in S3 via Nextflow. If I just specified `publishDir output_dir , mode: 'copy' all was good.
3) More worryingly when I used the block

workflow.onComplete = { 
    if( workflow.success ) { ['sudo','rm','-rf', workDir].execute().waitFor() }
    println( workflow.success ? "Done!" : "Oops .. something went wrong" )
}

as shown in https://github.com/pditommaso/paraMSA the workflow failed to be able to publish the outputs from the final step since the workflow.onComplete step had deleted the outputs. It doesn't appear to wait for the publshDir copy step to finish

@l-modolo How does your config look like?
Laurent Modolo
@l-modolo
Sep 18 2017 09:37
@aunderwo To this for this job:
$get_fastq_name {
    executor = "sge"
    cpus = 4
    memory = "5GB"
    time = "24h"
    queueSize = 1000
    pollInterval = '60sec'
    queue = 'E5-2670deb128A,E5-2670deb128B,E5-2670deb128C,E5-2670deb128D,E5-2670deb128E,E5-2670deb128F'
    penv = 'openmp16'
  }
Anthony Underwood
@aunderwo
Sep 18 2017 09:44

@l-modolo doesn't seem to be anything unexpected there. I hadn't come across the multiple queues before but that matches the specified syntax. I have run jobs on SGE without a problem with a simpler config, but all directives should be recognised.

Are there other files in the work_dir /scratch/cburny/readthroughpombe/work/86/8933614ddd2b675bb6e619f018fb43? Permissions OK

Laurent Modolo
@l-modolo
Sep 18 2017 09:53

@aunderwo They seem fine to me:

drwxr-xr-x 2 lmodolo lbmc   14 sept. 18 09:13 ./
drwxr-xr-x 3 lmodolo lbmc    9 sept. 15 16:44 ../
lrwxrwxrwx 1 lmodolo lbmc   73 sept. 18 09:13 2017_04_14_PLBD11_wt_R3.fastq -> /scratch/cburny/readthroughpombe/data/fastq/2017_04_14_PLBD11_wt_R3.fastq
-rw-r--r-- 1 lmodolo lbmc    0 sept. 18 09:13 .command.begin
-rw-r--r-- 1 lmodolo lbmc  122 sept. 18 09:13 .command.err
-rw-r--r-- 1 lmodolo lbmc  122 sept. 18 09:13 .command.log
-rw-r--r-- 1 lmodolo lbmc    0 sept. 18 09:13 .command.out
-rw-r--r-- 1 lmodolo lbmc 2,6K sept. 15 16:44 .command.run.1
-rw-r--r-- 1 lmodolo lbmc    2 sept. 18 09:13 .command.trace
-rw-r--r-- 1 lmodolo lbmc    1 sept. 18 09:13 .exitcode

There is this line in the file .command.run.1:

/bin/bash -ue /scratch/cburny/readthroughpombe/work/86/8933614ddd2b675bb6e619f018fb43/.command.sh
Anthony Underwood
@aunderwo
Sep 18 2017 09:54
anything in .command.run.1 ?
Paolo Di Tommaso
@pditommaso
Sep 18 2017 09:59
@aunderwo regarding 1) is it an issue with the eu-west-2 region? may be worth to open an issue and continue the discussion there
2) I guess the problem is with the syntax, it should be publishDir "${output_dir}/summary" , mode: 'copy'
like in bash, $ is required to interpolate variables in a string
3) interesting .. I should check that, however it was mainly a workaround, I would like to add a proper declarative cleanup setting
Anthony Underwood
@aunderwo
Sep 18 2017 10:05

2) That was the syntax I used

2) my publishDir was specified as publishDir ${output_dir}/summary , mode: 'copy'. This didn't work since output_dir is specified as s3://my-bucket/outputs and it doesn't appear that the subdir summary can be created in S3 via Nextflow. If I just specified `publishDir output_dir , mode: 'copy' all was good.

Paolo Di Tommaso
@pditommaso
Sep 18 2017 10:07
I don't see " around ${output_dir}/summary
that won't wok
Anthony Underwood
@aunderwo
Sep 18 2017 10:08
@pditommaso regarding 1) - yes it is an issue with eu-west-2. Probably just worth noting for the moment not to use eu-west-2

@pditommaso Checking back on my code it did have quotes

process run_ariba_summary {
  publishDir "${output_dir}/summary", mode: 'copy'

  input:
  file summary_tsv from summary_channel.collect()

  output:
  file "ariba_summary.*"

  """
  ariba summary ariba_summary ${summary_tsv}
  """

}

This doesn't work on S3

@pditommaso btw did you see my previous post

Any idea why nextflow asks for a private token for a public repo on gitlab.com?

nextflow run https://gitlab.com/phe-oxford-public/nextflow_mykrobe_predictor
N E X T F L O W  ~  version 0.25.7
Pulling phe-oxford-public/nextflow_mykrobe_predictor ...
Missing Gitlab private token -- Check file: /Users/anthony/.nextflow/scm
Paolo Di Tommaso
@pditommaso
Sep 18 2017 10:13
checking ..
Anthony Underwood
@aunderwo
Sep 18 2017 10:48
When running on AWS and specifying a m4.2xlarge 8vCPU 32GB RAM instance, how would you restrict nextflow to only submitting 4 jobs per node?
Paolo Di Tommaso
@pditommaso
Sep 18 2017 11:55
you can set the maxCpus per node, not the max jobs per node
what's the use case for that?
Anthony Underwood
@aunderwo
Sep 18 2017 11:57
I need min 10GB per process but the 2xlarge has 4GB / CPU. I can't find a high RAM instance on EC2
Paolo Di Tommaso
@pditommaso
Sep 18 2017 11:59
if you define define the memory NF will allocate the jobs accordingly to that
Anthony Underwood
@aunderwo
Sep 18 2017 12:09
how does NF know how much memory the worker nodes have?
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:10
it fetches from the instance settings
Anthony Underwood
@aunderwo
Sep 18 2017 12:12
instance settings? Does it know how much memory a m4.xlarge instance has?
Are these retrieved dynamically from AWS or do you hard code EC2 instance settings in NF code?
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:14
java api
Anthony Underwood
@aunderwo
Sep 18 2017 12:14
cool!
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:15
:)
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:33
@aunderwo regarding the Gitlab I think there's a bug, could you please open an issue for that?
Anthony Underwood
@aunderwo
Sep 18 2017 12:37
@pditommaso Done - nextflow-io/nextflow#457
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:37
thanks!
Paolo Di Tommaso
@pditommaso
Sep 18 2017 12:49
@aunderwo I've uploaded a patch that should fix the problem with GitLab
define this var in your environment
export NXF_VER=0.25.8-SNAPSHOT
than run NF as usual
Anthony Underwood
@aunderwo
Sep 18 2017 13:54
@pditommaso Yay! fast work that did the trick :)
Paolo Di Tommaso
@pditommaso
Sep 18 2017 13:55
cool
Ashley S Doane
@DoaneAS
Sep 18 2017 18:12
@ewels thanks!
Venkat Malladi
@vsmalladi
Sep 18 2017 19:31
Anyone have a good example of how to give command line usage the params that are expected in a workflow?
Paolo Di Tommaso
@pditommaso
Sep 18 2017 19:33
if( params.help ) {
  print workflowUsage()
}

def workflowUsage() {
  """
  --foo Parameter `foo` enable this 
  --bar Parameter `bar` allows that
  ... 
  """
}
Venkat Malladi
@vsmalladi
Sep 18 2017 19:34
@pditommaso thanks
Paolo Di Tommaso
@pditommaso
Sep 18 2017 19:34
:+1:
tho, we are planning to improve this, see #168
oops it's not that
see #144
Venkat Malladi
@vsmalladi
Sep 18 2017 19:37
+1
I will follow