Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 11:45
    l-modolo opened #3590
  • 11:20
    pditommaso commented #3477
  • 10:39
    l-modolo commented #3477
  • 10:16
    pditommaso commented #3477
  • 10:14
    l-modolo commented #3477
  • 09:43
    pditommaso commented #3585
  • 09:17
    pditommaso commented on 868374d
  • 08:14
    jfy133 edited #3589
  • 08:14
    jfy133 opened #3589
  • 08:06
    jfy133 closed #3571
  • 08:06
    jfy133 commented #3571
  • 08:05
    jfy133 synchronize #3571
  • 07:38
    l-modolo commented #3477
  • 06:59
    marcodelapierre commented #3588
  • 06:22
    jordeu commented #3337
  • 06:22
    jordeu commented #3337
  • 06:19
    jordeu commented #3337
  • 03:43
    marcodelapierre opened #3588
  • 01:21
    mribeirodantas commented #3586
  • 01:20
    mribeirodantas synchronize #3586
Brandon Cazander
Has anyone had success with composing workflows from multiple pipelines that are in different repositories?
Hi @pditommaso , I was running parallel processes using AWS batch, as it running parallelly, when each job executed it overwriting with other folders, so i have included a R script that will create PATH individually based on their driver gene names. Basically this R script will generate .png files, i was running 10000 driverGenes processes parallelly, inside the publish dir i can see 10000 contains .png but I have to collect all .png files and it should save into that one directory, can you please suggest me best way to solve this
1 reply
Guillaume Noell
Hi @pditommaso , I've successfully run channel.sql.fromQuery("SELECT * FROM CSVREAD('test.csv') where foo>=2;") . Is it also possible to use an H2 csv table for the sqlInsert ?
John Ma
Hi, I just want to confirm one thing: if subworkflow is not executed, will subworkflow.out always be an empty array? If that's not always the case, how can I detect whether a subworkflow has been executed?
Hi everyone, Is there a way to parallelize a specific command in a process? I am using prokka in one of the processes and looking for away to parallelize this step. Any advice would be greatly appreciated.
3 replies
Steven P. Vensko II

I am trying to run some processed using google-lifesciences. It appears the GATK- Docker image from the Broad is causing issues:

Error executing process > 'lens:procd_fqs_to_procd_alns:raw_alns_to_procd_alns:bams_to_base_qual_recal_w_indices:gatk_index_feature_file (Homo_sapiens_assembly38.dbsnp138.vcf.gz)'

Caused by:
  Process `lens:procd_fqs_to_procd_alns:raw_alns_to_procd_alns:bams_to_base_qual_recal_w_indices:gatk_index_feature_file (Homo_sapiens_assembly38.dbsnp138.vcf.gz)` terminated with an error exit status (9)

Command executed:

  gatk IndexFeatureFile  -I Homo_sapiens_assembly38.dbsnp138.vcf.gz

Command exit status:

Command output:

Command error:
  Execution failed: generic::failed_precondition: pulling image: docker pull: running ["docker" "pull" "broadinstitute/gatk:"]: exit status 1 (standard error: "failed to register layer: Error processing tar file(exit status 1): write /opt/miniconda/lib/python3.6/__pycache__/selectors.cpython-36.pyc: no space left on device\n")

Work dir:

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

I've tried upping the lifeSciences.bootDiskSize but that doesn't seem to help. Any tips?

4 replies
John Ma


I have the following code:

John Ma
        path gex_metrics, stageAs: 'gex.txt'
        path adt_metrics, stageAs: 'adt.txt'
        val output_dir
        path "metrics_summary.csv", emit: metrics_csv
    publishDir "${output_dir}", mode: 'copy'
        for (one_line: file('gex.txt').readLines()){

I have changed the file declaration in exec between 'gex.txt' and gex_metrics, but in both cases I got a "No such file" error and refers to a nonexistant file on output_dir (for example, in this case it's "${output_dir}/gex.txt". I'm using DSL2.

Can anyone thing of an solution for this? Thanks!

Michael Bale

Hi all, I'm having trouble with getting nextflow to email me on completion of the pipeline -- specifically, the workflow.onComplete handler is either not working or I have a weird error.
My config file where the workflow.onComplete is housed has:

workflow.onComplete {

        to: ${params.recipient},
        subject: 'pipeline execution: ${params.name}'
        Pipeline execution summary
        Completed at: ${workflow.complete}
        Duration    : ${workflow.duration}
        Success     : ${workflow.success}
        workDir     : ${workflow.workDir}
        exit status : ${workflow.exitStatus}
        Error report: ${workflow.errorReport ?: '-'}


And I'm getting either a compilation error at the open brace for workflow.onComplete or an error that says

Unknown method invocation `onComplete` on ConfigObject type -- Did you mean?

Any help you can provide would be greatly appreciated!

1 reply
Ben R. Alexander

Hello @pditommaso and the rest of the nextflow community,

I am a big fan of the approach taken by Nextflow, and I am actively pushing people to adopt it as the principal data flow control language for our genetics community at the major pharmaceutical firm where I work. Our local high-performance UNIX cluster is entirely inadequate to the multitude of algorithms that we need to run, however, so running on a cloud provider is essential. Lately I’ve run up against a challenge when running with the AWS Batch mode, and I’m hoping that someone can suggest a workaround.

In our module-based, DSL-2 driven environment I’ve implemented a structure that I thought would be flexible, using some initial processes to gather (and check for the existence) of files, and then making symbolic links to those files wherever they originate so that I could access them in subsequent steps managed by Nextflow. I’ve been using an approach along these lines

process gather_essaential_files{
        val genotype_bim_name
        val genotype_bed_name
        val genotype_fam_name
        val working_dir    
        path("$genotype_bim_name"), optional: true, emit: geno_bim
        path("$genotype_bed_name"), optional: true, emit: geno_bed
        path("$genotype_fam_name"), optional: true, emit: geno_fam

           ln -s $working_dir/$genotype_bim_name $genotype_bim_name
           ln -s $working_dir/$genotype_bed_name $genotype_bed_name
           ln -s $working_dir/$genotype_fam_name $genotype_fam_name      

After a process such as this one I can treat all the files in subsequent processes as if there local, and the logic all seems very clean. I find that this approach runs wonderfully in my local environment, and making those file links is naturally super fast. When I try to run it on AWS BATCH, however, I don’t get file links, but instead those files each get uploaded to the S3 bucket where next flow is managing its process-specific work directories. This uploading/coping approach would make sense to me if we were dealing with s3-based data (given that S3 isn’t mounted like a real file system), but we aren’t. I’m using AWS FSx to allow access to those files, and therefore they act like any other local files. Here’s my question: is there some parameter/trick that I can use that would allow me to actually create symbolic links instead of copying the files every time they are referenced? We use lots of large data files, and we want to run in a massively parallel environment, and if we are forced to copy those data files every time they are needed by process and the whole approach becomes less viable. I would love a configuration parameter that would allow me to say “link-instead-of-copying” in the context of the AWSBATCH profile. Does anybody have a recommendation for me?

Thanks for any insights, Ben

1 reply
Vlad Kiselev
Hi all! Maybe a naive question, but is it possible to pass a value from a shell script to an output tuple?
Vlad Kiselev
thanks, all sorted!
1 reply

Hi, I've been trying to port my codebase to DSL2, but I have hit a snag. I have a bunch of paired reads that I put in a channel using fromFilePairs and chunk using splitFastq. I pass this to Trimmomatic and from that process I emit tuples of ID, and the two paired file chunks that a given instance has trimmed (tuple val(id), file("${id}_trim_1P.fq"), file("${id}_trim_2P.fq")).

I previously was running the following to merge the fq files of forward & reverse chunks:

trimmomatic.out.trimmed_reads.collectFile() { item ->
    [ "${item[0]}_trim_1P.fq", item[1] ]
}.set{ collected_trimmed_first_reads }
trimmomatic.out.trimmed_reads.collectFile() { item ->
    [ "${item[0]}_trim_2P.fq", item[2] ]
}.set{ collected_trimmed_second_reads }
                                     .map { item -> tuple(item[0].simpleName, item[0], item[1]) }
                                     .set{ collected_trimmed_reads }

But in DSL2, with merge deprecated (I think actually removed as it fails to run after the deprecation warning), I'd like to move to the suggested new pattern using join. However, I cannot figure out what the pattern is that allows join to behave like merge.

I thought maybe to add an integer "key" to each element of the collected file lists produced, but I could only think of doing that in terms of merge too.

The next step in my workflow absolutely requires these chunks to be collected back up, so I can't sidestep this issue.

Thanks in advance for any suggestions on how I can do this better!

Could be a 30s question - is it possible to get the trace log to report the name of the host executing them, or am I best to run hostname within the process?
Reece Hart

Are there alternatives to installing nextflow with curl get.nextflow.io | bash?

Although the curl-pipe-sh practice has become commonplace, it's quite inappropriate for a controlled environment.
(Let's please not debate this here. I'm interested only in answers to the narrow question about whether alternatives exist.)

2 replies
Moritz E. Beber
You can always download the bash script and inspect it before running or what do you mean?
1 reply
James Fellows Yates
Does anyone know a way to customise or change the colours for the nextflow console?
OS Dark mode (on Ubuntu 20.04) makes the console pretty unreadable (particularly output...)
Filipe Alves
@jfy133:matrix.org Why don't you use vscode IDE?
1 reply
James Fellows Yates
(I do use VSCod(ium), but I'm looking with the console to make very small prototypes to test certain concepts, I'm not actually developing anything in the console)
Filipe Alves
Try searching how to customise a groovy console.
I found this online https://stackoverflow.com/questions/47893514/how-to-change-the-font-of-groovyconsole
Hope this helps
1 reply
Pablo Riesgo-Ferreiro

Hi people, I stumbled upon an issue for which I am clueless. I have this small R code that I run with Rscript that uses this sequenza library.

Rscript -e 'test <- sequenza::sequenza.extract("${seqz}", verbose = TRUE);'

The above fails when ${seqz} contains the absolute or relative path to a symbolic link, but if it has the "real" path to the file it works. Does someone have a hypothesis for what may be happening?

Nah don't listen to me, this is something else....
Pablo Riesgo-Ferreiro
Same file copied in different locations works or does not work... but consistently the one that works always works... agggh
doesn't seem like a nextflow issue, sorry for the noise
Luca Cozzuto
dear all
sometimes I stumble on a problem with R libraries
using nextflow + singularity
it is like R is looking for libraries in the user space instead of looking at the container
I put in my nextflow.config this but is not helping
env {
    R_PROFILE_USER   = "/.Rprofile"
    R_ENVIRON_USER   = "/.Renviron"
any clue?
1 reply
Luca Cozzuto
the script is also running with the parameter --vanilla
Cagatay Aydin
Hello People,
We have parallel workers, which consume messages coming from a server, and then run nextflow run. The problem is, if the number of messages is high, these parallel workers initialize nextflow almost at the same time. This causes the following error to occur in high frequency: Can't lock file: /home/myhomedir/.nextflow/history -- Nextflow needs to run in a file system that supports file locks. I suspect this happens because when one nextflow process puts a lock on the $HOME/nextflow/history, another nextflow process tries to put a lock on the same file, before it is released by the former. Is this something intended? Any ideas how to properly handle this without dirty workarounds?
Luca Cozzuto
sorry are you running nextflow in parallel?
Cagatay Aydin
yes, each worker (12 in total) executes a nextflow process independently, if there are >1 job in queue, they could be in running parallel

I checked the source code, it is raised from here in modules/nextflow/src/main/groovy/nextflow/util/HistoryFile.groovy:

            try {
                while( true ) {
                    lock = fos.getChannel().tryLock()
                    if( lock ) break
                    if( System.currentTimeMillis() - ts < 1_000 )
                        sleep rnd.nextInt(75)
                    else {
                        error = new IllegalStateException("Can't lock file: ${this.absolutePath} -- Nextflow needs to run in a file system that supports file locks")
                if( lock ) {
                    return action.call()

The problem is, it tries to lock for a sec (if I'm reading Java correctly) and then quits if it can't. Am I not supposed to run multiple nextflow processes in parallel?

Cagatay Aydin
One workaroud could be to use a different nextflow/history file path for each worker, but apparently it is hardcoded in the same file.
Cagatay Aydin
Also note that the error string is not entirely correct, the filesystem supports filelocks in my case, it is just that the file itself is locked by another nextflow process within a tiny time window
Luca Cozzuto
well I think you are going a bit against the nextflow philosophy here
you should use nextflow to parallelize more than parallelize nextflow
or maybe you can do a nextflow of nextflows...
Cagatay Aydin
I'm not sure, but that might not be possible in our case, because the workers consume messages from another server, prepare the arguments for the nextflow command, and then call nextflow with those arguments
Luca Cozzuto
but then you have an orchestrator that is not working as an orchestrator
nextflow should submit the jobs to your server, this is what is for
Steven P. Vensko II

I've got a curious issue -- I am running Nextflow on a cluster that I typically do not use. many of my processes are getting errors like the following:

[b8/551435] NOTE: Process `lens:manifest_to_dna_procd_fqs:trim_galore (VanAllen_antiCTLA4_2015/p017/ad-770067)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (1)

Yet, if I go to the work directory, the process is clearly still running:

(base) [spvensko@longleaf-login4 5cb96b7bf20f816c13f22f8f0e3b08]$ realpath .
(base) [spvensko@longleaf-login4 5cb96b7bf20f816c13f22f8f0e3b08]$ ls -lhdrt *
lrwxrwxrwx 1 spvensko users   75 Oct 21 13:11 VanAllen_antiCTLA4_2015-p013-nd-780020_1.fastq.gz -> /pine/scr/s/p/spvensko/fastqs/VanAllen_antiCTLA4_2015/SRR2780020_1.fastq.gz
lrwxrwxrwx 1 spvensko users   75 Oct 21 13:11 VanAllen_antiCTLA4_2015-p013-nd-780020_2.fastq.gz -> /pine/scr/s/p/spvensko/fastqs/VanAllen_antiCTLA4_2015/SRR2780020_2.fastq.gz
-rw-r--r-- 1 spvensko users 3.8G Oct 21 13:23 VanAllen_antiCTLA4_2015-p013-nd-780020_1_trimmed.fq.gz
-rw-r--r-- 1 spvensko users 3.3K Oct 21 13:23 VanAllen_antiCTLA4_2015-p013-nd-780020_1.fastq.gz_trimming_report.txt
-rw-r--r-- 1 spvensko users  630 Oct 21 13:23 VanAllen_antiCTLA4_2015-p013-nd-780020_2.fastq.gz_trimming_report.txt
-rw-r--r-- 1 spvensko users 2.0G Oct 21 13:30 VanAllen_antiCTLA4_2015-p013-nd-780020_2_trimmed.fq.gz

Anyone seen this behavior before?

1 reply
Is there a way to count the number of elements in a channel and exit if there aren't enough files?
I'm looking for something similar to .ifEmpty(), but instead of empty, I want it to do something if there are less than 5 values in it.