Bede Constantinides
Solution: set NXF_ASSETS to a path without a quota
Ignacio Eguinoa
Hi all, I was wondering if there is any resource to help parsing a nextflow file/s into a tree of the syntax elements (AST), ideally using Python. I'm working on a small tool that tries to parse some metadata from the process, channels, etc and need to parse the workflow definition into objects.
similar to what the javalang packages does in python to parse generic Java code (https://pypi.org/project/javalang/)
George Seed
So, I've got a channel that reads my samplesheet and grabs the second column, and counts how many unique elements it has:
        .map{it -> [it[2]]}

How can I use this as the 'size' part of a groupTuple? I tried:


But it did not like it - complained about the value type etc. All thoughts gladly received!

George Seed
Re. my post last night, the solution posted here nextflow-io/nextflow#1702 does work - for some reason - so that's nice.
Vlad Kiselev
I have a channel with tuples consisting of some files and a value. I would like to run a process on this channel. Specifically, I'd like to repeat the process for each value (which is part of the input tuple) and I would also like to .collect() the files from each tuple at the same time. Anyone has done this before? I've already read about each and combine, so my input channel is of the right format. The problem I have is .collect() - not sure how to incorporate it in the input tuple.
Wenhui Wu
@robsyme Hi Robert, may I ask how you set the NCBI_API_KEY environment variable. I set it in my .bash_profile, but it doesn't seem to work for nextflow. Thanks!
Michael Bale
Is there a way to have nextflow set the publishDir to an s3 bucket from not aws batch?
Toni Hermoso Pulido
Good morning! Is it possible to generate a DAG dot file without the need of running a whole pipeline?
Szymon Szyszkowski
Hello Everyone, I have a nf pipeline running on local server, pipeline logs the ABORTED signal when logging with the nextflow log, but I can not find out the reason to this failure message, could You provide me some explanation to it ?

Hi, in DSL2, do functions allow the passing in of channels? I was hoping to clean up a workflow where a similar sequence of operations is applied to channels, but it looks like channels passed into functions are demoted to plain Java collections.

The error reads:

No signature of method: java.util.LinkedHashMap.collectFile() is applicable for argument types: (LinkedHashMap, Script_77e02758$_collect_file_tuples_closure1) values: [[storeDir:null, sort:hash], Script_77e02758$_collect_file_tuples_closure1@54e2fe]
hey guys newbie here who's trying to get a quick proof of concept for using nextflow. I've some questions and hope people could give me some guidance (^-^)
1) After a run, does nextflow do any sort of automatic cleanup to its persistent volume claim if we're using kubernetes.
Chris Wright
I'm having trouble implementing a particular data flow pattern in Nextflow, essentially tasks being block by non-dependent calculations when a channel requires grouping on a key. A full example is in the gist below:
Central to this is not knowing how to use the items in one channel to allow the non-blocking grouping of another channel.
Arijit Panda
I am trying to run a SGE job using current environment setting. In normal SGE code, -V parameter is used. can someone suggest what is the nextflow option to enable that option. Thanks
Hi all. I'm making a config file to specify error strategies for different processes in my pipeline. For one process I would like to increase number of cpus allocated in case of error 140 and 143 and to just retries for 10 times for all the other kind of errors. I tried the two following sintaxes but both didn't work
process {
    withName: structural_alignment {
        if (task.exitStatus in 140..143 )
            errorStrategy = 'retry'
            cpus   = {2 * task.attempt}
                maxRetries = 5
            errorStrategy = 'retry'
                maxRetries = 10
process {
 withName: structural_alignment {
        if (task.exitStatus in 140..143 ) {
            errorStrategy = 'retry'
            cpus   = {2 * task.attempt}
                maxRetries = 5
        else {
            errorStrategy = 'retry'
                maxRetries = 10
Also how can I specify only error 140 and 143 and not all the other in the middle? I tried to separate them with a comma, but it didn't work

Hi guys.
I have a simple doubt (I think). I just need to save my output process in an external directory so I'm using the publishDir function for it. However I need 2 things: i) use variables (collected from input tuple); ii) Create the directory because it doesn't exist.
I successfully run it like this:

    publishDir "${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification", mode: 'move'

    file STAR_bam_file from STAR_alignment_bam
    set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3

But It doesn't create the directory. So I tried switching to this:

    outDir = file("${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification")
    publishDir outDir, mode: 'move'

    file STAR_bam_file from STAR_alignment_bam
    set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3

However it states the error of:
No such variable: tcga_project

Any help in this situtation? Thanks :)

Hi all,
If exit 1, "ERROR MSG" is from a sub-workflow, the message will not be printed out to console, though it is inside .nextflow.log. I must use NXF_ANSI_LOG=false nextflow run main.nf to force the exit message to console.
Can anyone let me know why and how to solve it?
Hi all! I am trying to find how to set a profile in a config file. For example, I'm using a nf-core package, and am sick of running -profile docker at the command line level, and instead I would like to have in my nextflow.config file profile = docker such that it successfully imports all of the config options specified by the docker profile. thanks
Brandon Cazander
I'm using collectFile in my workflow to combine the results of several upstream processes, and setting storeDir. However, when I resume my workflow (using AWS Batch executor), even though the upstream processes are correctly cached, the downstream process that consumes the output of collectFile always re-executes. I've tried sorting the input to collectFile with no luck.
Do you think if it's possible to decouple the executor destination from the user pipeline? My purpose is to have "one architecture pipeline" deciding the executor based on different metrics for each step of another pipeline ("user workflow"). I'm thinking on generate dynamically the user pipeline file, but I don't know if is possible to achieve this as a built-in functionality on the tool. Thanks in advance.
How to use docker from a private repo (user and password needed), can not find this in Scope docker:
 docker {
    enabled = true
process {
   withName:toy {
        container = 'user.abc.io/test/example'
How come

  .fromFilePairs( "${params.reads}/*{1,2}*.fastq.gz", size: 2 )
  .into { paired_reads }

does not work when the file starts with a number (like 7126-MS-1_100_S1_L005_R1_001.fastq.gz and 7126-MS-1_100_S1_L005_R2_001.fastq.gz or 21VR067049-211104_S1_L001_R1_001.fastq.gz and

Moritz E. Beber
What is the recommended way to write a CSV file from nextflow? I can't seem to import the CSVPrinter https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVPrinter.html
David Mas-Ponte
Hello all, I am using .collate() in my pipeline and then an implementation of this which works elegantly. My problem is with the -resume option. Some processes get cached, but the rest are re-scheduled. I am assuming the order changes from run to run before the collate operator, and thus assumes input has changed. Could this be it?
Also, what is the “canonical” way of sorting a channel? Maybe this?
Brandon Cazander
When running with the AWS Batch executor, is it the main nextflow process that publishes files (using publishDir directive, or do the individual processes do this themselves?
Hello Group members, I am trying a sample nextflow script using Azure Blob storage for work directory. I am getting an error and couldn't understand if my setup has anything to do, appreciate your response. Here is the error when trying to run my sample script connecting to Azure Blob storage

nextflow run tutorial.nf -w az://nextflow-work/work

How to deal with the empty path channel in process (dsl2)? I saw a solution for empty channel of value (https://github.com/nextflow-io/patterns/blob/master/docs/process-when-empty.adoc)

params.a = '*.txt'
params.b = ''
params.c = ''

process test {
   tag "test "
   echo true

   path file1
   path file2
   val  value1

   echo $file1
   echo $file2
   echo $value1


workflow {
    a_ch = Channel.fromPath(params.a)
    b_ch = Channel.fromPath(params.b)

The Error message:

N E X T F L O W  ~  version 21.04.3
Launching `code/test2.nf` [modest_colden] - revision: b4078eef6a
Missing `fromPath` parameter
could try a_ch = params.a ? Channel.fromPath(params.a) : [] maybe?
I'm trying to use Docker in Nextflow script and I'm getting the following error:

"connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Fr
un%2Fdocker.sock/v1.24/containers/create?name=nxf-8IoBF1Fj7HBmfAtOGRGpW5Fp": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'."

Does anyone have any advice on how to handle this?


:point_up: Edit: Hi,

I'm trying to use Docker in Nextflow script and I'm getting the following error:

"connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Fr
un%2Fdocker.sock/v1.24/containers/create?name=nxf-8IoBF1Fj7HBmfAtOGRGpW5Fp": dial unix /var/run/docker.sock: connect: permission denied.
 See 'docker run --help'."

Does anyone have any advice on how to handle this?

Alan Hoyle
In a profile in the Nextflow config, it sets process.errorStrategy = 'finish'. Can someone remind me how to override that at the command line? I want to set it to 'retry' (and I assume I can do something similar to set maxTries = 4)
Nicolas Mendoza


I am trying to run nextflow inside a docker and from there: run other dockers in the host machine. For that I have built a docker image with nextflow and docker inside, as so:

FROM nextflow/nextflow

# Adds ability to run docker in the host
RUN amazon-linux-extras install docker -y

WORKDIR /nextflow

But when running the below line, it says Command error: /bin/bash: .command.sh: No such file or directory

docker run -v /var/run/docker.sock:/var/run/docker.sock custom/nextflow bash -c 'nextflow run hello -with-docker nextflow/examples'

Does anyone have an idea on how to fix this?

Richard Corbett
Hi all, is there a way to "flush" the publish command? I understand it is asynchronous by default, but I have a pipeline that runs for 7-10 days and some of the useful results are complete after the first day. Would be good if I could request the results from day 1 are "published" soon after completion.
Hello Everyone. Running NF as part of a custom made script, wrapping around MatLaB commands. I have a problem where some of my sub-processes fail, but I can't workout why: Caused by:
Process init_Config (3) terminated for an unknown reason -- Likely it has been terminated by the
external system
Looking at the log files of the other 'init_Config' processes, nothing differentiates them. No error flags around wall-time or ressource usage. How do you trouble-shoot that?
Scott Yourstone

Hello everyone,

Is it possible to update process information using withLabel on the command line? So something like:

nextflow run hello_world.nf -process.withLabel:hello_world.cpus 2

I know that something as simple as this works:

nextflow run hello_world.nf -process.cpus 2

but I want to target a particular process.

Håkon Kaspersen

hello everyone, I am using singularity images from biocontainers for most of my processes, where they are pre-downloaded and used directly (as the cluster nodes don't have internet access). However, I am having issues with the Prokka image from that site. I pull the images from biocontainers with singularity pull, and use that image directly in the nextflow pipeline. However, it seems that nextflow doesn't want to use the prokka image directly, it rather wants to download it from docker. Note that it works fine for all other processes that have been downloaded the same way. Does anyone know anything about this image? https://biocontainers.pro/tools/prokka
(Note: I have tried several versions of the singularity image, all give the same result).
My nextflow.config:

params.container_dir    = 'path/to/images'

singularity {
        enabled         = true
        autoMounts      = true

The container is called like this in the prokka process:

process PROKKA {
        container "${container_dir}/prokka:1.14.5--pl526_1"

Thanks in advance!

Hi, I have a (stupid?) question :). What is the synthax for comments in the script block?
I am trying to figure out how to separate and categorize some files based on some string values in their name. For example, if the files are named;




How can I separate the files based on f0, f1, f2, f3 and have each category saved under a different name? Then I want to separate the same files based on the numbers after the "r" but before the underscore. There are a lot of different numbers so I can't be specific in the script about the numbers to select. These same files will then be separated again based on the number before "_R1". So in the end, I will have a category of files based on f-number then a subcategory based on r number then another subcategory under that based on a number.


Pablo Riesgo-Ferreiro
hi all,
I have searched for error handling in Nextflow docs without success. Can someone point me in the right direction?
I just want to do some sanity checks on the inputs of a process and raise a controlled error if certain error conditions are met.
Groovy's assert did not seemed to work ...
forget it, Groovy's assert methods do work!
rubber duck effect

Hi all,

My challenge of my code is to create input for pairwise comparison. Based on the comparison list, I would like to select the samples that are in that group. I have tried the following but not with complete success as I receive a DataflowBroadcast around DataflowStream[?]

bams = Channel.from(["sampleX_1", "groupX", "bamX_1", "baiX_1"], 
                            ["sampleX_2", "groupX", "bamX_2", "baiX_2"], 
                            ["sampleY_1", "groupY", "bamY_1", "baiY_1"], 
                            ["sampleY_2", "groupY", "bamY_2", "baiY_2"],
                            ["sampleZ_1", "groupZ", "bamZ_1", "baiZ_1"], 
                            ["sampleZ_2", "groupZ", "bamZ_2", "baiZ_2"])

 comparison_list = Channel.from(["groupX", "groupY"],["groupX", "groupZ"])

 group_input = comparison_list_file.map{ it ->
                                                    def bam_by_group = bam.groupTuple(by:1)
                                                    def compare1 = it[0]
                                                    def compare2 = it[1]
                                                    def group_ctrl = bam_by_group.first{ it[1] == compare1}
                                                    def group_case = bam_by_group.first{ it[1] == compare2}
                                                    def group_input = group_ctrl.combine(group_case)

                                                    return group_input.view() // - desired outcome, see below
                                                    }.view() // results into - DataflowBroadcast around DataflowStream[?]

I like the result to have it in this format:

        // [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleY_1, sampleY_2], groupY, [bamY_1, bamY_2], [baiY_1, baiY_2]]
        // [[sampleX_1, sampleX_2], groupX, [bamX_1, bamX_2], [baiX_1, baiX_2], [sampleZ_1, sampleZ_2], groupZ, [bamZ_1, bamZ_2], [baiZ_1, baiZ_2]]

Of which group_input.view() does but how can I use that?


Christopher Hartl

On a related note, I'm trying to group process outputs together for downstream paired processing. I have a digest file that looks like


and I have a basic pipeline that drives from per-column channels and looks like

aln_bams = bwamem(samples_ch, read1_ch, read2_ch)
filt_bams = mapq_filter(aln_bams, 30)

but now i need to get to bam channels that look like

[cell1_trt1.bam, cell1_trt2.bam, cell2_trt1.bam, cell2_trt2.bam]
[cell1_ctl.bam, cell1_ctl.bam, cell2_ctl.bam, cell2_ctl.bam]

the issue is I can't find a way to join the filtered bams with the original IDs. In DSL=1, I could merge; but using DSL=2 I need an index on which to join, and there's no such thing as .mapWithIndex.

Any ideas?

Arijit Panda
Is there any example/usage available of publishDir in DSL2 ?
Arijit Panda
Also, is it possible to set the publishDir in the configuration file?