These are chat archives for nextflow-io/nextflow

13th
Mar 2019
Steve Marshall
@stevemmarshall_gitlab
Mar 13 12:43
Hi... I'm having an issue trying to use my AWS Batch Job Definition in my nextflow.config... I have this line process.containter = 'job-definition://Batch-Genomics-CompBio-Dev' but I get Caused by:
Invalid AWS Batch job definition -- provide a Docker image name or a Batch job definition name
I can put in my ECR image and that works but then it creates a new job definition without some custom configuration... was hoping to use my existing job def
Paolo Di Tommaso
@pditommaso
Mar 13 12:45
it's supposed to work, double check def name spelling, permissions, region, etc
Steve Marshall
@stevemmarshall_gitlab
Mar 13 12:46
I pass my secrets and my region
do you need to include the version?
Paolo Di Tommaso
@pditommaso
Mar 13 12:46
I would say no
check with the awscli you are able to retrieve it
Steve Marshall
@stevemmarshall_gitlab
Mar 13 12:49
with aws cli I ran this command and it worked fine aws batch describe-job-definitions --status ACTIVE
it lists the job defintions and my docker image
Philip Jonsson
@kpjonsson
Mar 13 13:08
Hi @pditommaso. Re: #1071, our cluster admins have confirmed that both -R and -M are per slot on our LSF cluster. As I pointed out in the thread, I can't say how common that is, but it is indeed the case on our cluster.
Dhanaprakash
@dhanaprakashj
Mar 13 13:14

HI, I am writing a pipeline and I have problems with the handling of gzip files in nextflow. The command throws up an indexing error in nextflow but the "executed command" runs successfully in shell. If I unzip the files, they work in nextflow but I get the same indexing error for the next step as well (but this time all input files are unzipped).

I wrote a small nf script to test the process alone to see if there is any errors in receiving inputs from previous process but it doesn't seem to the case, I get the same error and the "executed command" works in shell.

Has anyone faced this kind of situation, and can suggest a solution for this.
Evan Floden
@evanfloden
Mar 13 13:16
Can you provide the code snippet and error?
Dhanaprakash
@dhanaprakashj
Mar 13 13:18

`#!/usr/bin/env nextflow

params.dataDir = "/home/dhajam/project/nextflow_development/Data"
params.gzin = "/home/dhajam/project/nextflow_development/test/NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller.vcf"
params.recalin = "/home/dhajam/project/nextflow_development/test/NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller_recalibrate_SNP.recal"
params.tranchesin = "/home/dhajam/project/nextflow_development/test/NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller_recalibrate_SNP.tranches"
params.outf = "/home/dhajam/project/nextflow_development/test/recalibrated_snps.vcf"
params.alignmentReference = "$params.dataDir/Genomes/GRCh38_no_alt_analysis_set/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna"

gzIN = file(params.gzin)
recalIN = file(params.recalin)
tranchesIN = file(params.tranchesin)
outF = file(params.outf)

process applyVQSR {
tag "$gz"

 input:
 file gz from gzIN
 file recal from recalIN
 file tranches from tranchesIN
 file out from outF

 output:
 file "output_file" into outff

 """

java -jar /home/apps/gatk/current/gatk-package-4.0.7.0-local.jar ApplyVQSR -R ${params.alignmentReference} -V ${gz} -mode SNP --recal-file ${recal} --tranches-file ${tranches} -O ${out}
"""
}

workflow.onComplete {
println ( workflow.success ? "\nDone!": "It better work")
}
`

Oops, not properly formatted.

ERROR ~ Error executing process > 'applyVQSR (NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller.vcf)'

Caused by:
Process applyVQSR (NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller.vcf) terminated with an error exit status (2)

Command executed:

java -jar /home/apps/gatk/current/gatk-package-4.0.7.0-local.jar ApplyVQSR -R /home/dhajam/project/nextflow_development/Data/Genomes/GRCh38_no_alt_analysis_set/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna -V NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller.vcf -mode SNP --recal-file NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller_recalibrate_SNP.recal --tranches-file NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller_recalibrate_SNP.tranches -O recalibrated_snps.vcf

Command exit status:
2

Command output:
(empty)


A USER ERROR has occurred: Input NIST7035_TAAGGCGA_L001_BWA_Alignment_haplotypeCaller_recalibrate_SNP.recal must support random access to enable queries by interval. If it's a file, please index it using the bundled tool IndexFeatureFile


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Dhanaprakash
@dhanaprakashj
Mar 13 13:25
Any thoughts anyone, If someone would like to reproduce the error. I can provide link to the download datasets as well (they are all public access).
Evan Floden
@evanfloden
Mar 13 13:33
Did you try running the .command.sh file in the work directory?
Dhanaprakash
@dhanaprakashj
Mar 13 13:36
Yes, the .command.sh file runs sucessfully. But the .nf file doesn't run.
Evan Floden
@evanfloden
Mar 13 13:55
Best to put the reference file (alignmentReference) in a channel and declare it as an input.
Evan Floden
@evanfloden
Mar 13 14:01
I'm not understanding why out is in the input and why the expected output file is set to output_file and not ${out}.
Dhanaprakash
@dhanaprakashj
Mar 13 14:13
@evanfloden I am sorry, my mistake. This is a sample script I wrote to test it, didn't realize it. Nevertheless, the same error still persists.
Steve Marshall
@stevemmarshall_gitlab
Mar 13 14:26
@pditommaso I found the issue... a small typo :) process.containter should be process.container
Daniel E Cook
@danielecook
Mar 13 14:32
I see the groupBy operator is deprecated...what has it been replaced by?
Paolo Di Tommaso
@pditommaso
Mar 13 14:58
@stevemmarshall_gitlab ooooops :D
@danielecook depends what you have to do, most of the time groupTuple can replace it
Venkat Malladi
@vsmalladi
Mar 13 15:43
I am trying to use: $workflow.revision [$workflow.commitId] but not having any luck
Maxime Garcia
@MaxUlysse
Mar 13 15:45
@vsmalladi What are you trying to do?
Venkat Malladi
@vsmalladi
Mar 13 15:46
@MaxUlysse trying to output at runtime what version of the pipeline I am using
Maxime Garcia
@MaxUlysse
Mar 13 15:47
In the manifest scope you can specify a version for your pipeline
Venkat Malladi
@vsmalladi
Mar 13 15:47
okay what about commitId or does that not work?
Maxime Garcia
@MaxUlysse
Mar 13 15:48
commitId only work if it's run from a git repo
Basically, if a workflow.revision exists, then I'm showing it, or else it's te commitId if it exists, or else it the scriptId (that I cut to have only ten characters like the others
Venkat Malladi
@vsmalladi
Mar 13 15:50
@MaxUlysse thanks. Trying to run it from a local git repository
Venkat Malladi
@vsmalladi
Mar 13 15:57
Will look at your code
Laurence E. Bernstein
@lebernstein
Mar 13 16:27
@dhanaprakashj Have you gotten it to work? Remember that you will need to stage (specify as inputs) both the gz file AND the index since these commands expect the index to be located in the same directory as the gz file.
Dhanaprakash
@dhanaprakashj
Mar 13 18:19
@lebernstein Thank you for the comment, I got it to work after reading your comment. I staged the indexed file and it worked like a charm, but it was quite strange as the previous process also required the index file but since they have the same name and located in the same folder they worked.
Laurence E. Bernstein
@lebernstein
Mar 13 20:48

So I created a simple workflow to test this, but the problem is that the workflow works fine the first time I touch a file in the watch directory but the second time the workflow hangs after submitting process 1A.

The offending line is " file configs from workflowConfigs"
If I remove it, the workflow proceeds properly.

sampleJson = Channel
   .watchPath( '/home/laurence.e.bernstein/nextflow/input/sample*.json', 'create,modify' )

// These will allow us to stage the whole directory (all files in it)
workflowConfigs = Channel.fromPath("/home/laurence.e.bernstein/nextflow/config", type: 'dir')

process '1A_create_config' {

  input:
    file sampleJson

  output:
    file "sample.config" into sample_config_ch

  script:
  {
  """
    echo "dummy" > sample.config
  """
  }
}

process '2A_do_it' {

  input:
    file sampleConfig from sample_config_ch
    // Problem line is below
    file configs from workflowConfigs

  output:
    val true into done_ch

  script:
    """
      touch "did_it.txt"
    """
}

Anyone have any ideas?

Steve Marshall
@stevemmarshall_gitlab
Mar 13 21:43
Quick question... regarding the working dir and docker + AWS.. if you don't specify a working dir, does it simply using the current dir inside the container? It seems like it defaults to /tmp?
Yasset Perez-Riverol
@ypriverol
Mar 13 22:44
I’m receiving an error in my workflow terminated with an error exit status (137) -- Execution is retried I have increased the memory but nothing. The process run perfectly fine outside the container. Any idea ?