Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 14:04
    massung opened #2345
  • 10:04
    stale[bot] closed #1911
  • Sep 23 16:03
    unode commented #2341
  • Sep 23 15:52
    unode commented #2341
  • Sep 23 15:39
    pditommaso closed #2341
  • Sep 23 15:39
    pditommaso commented #2341
  • Sep 23 14:55
    glesica opened #2344
  • Sep 23 14:40
    phue commented #1594
  • Sep 23 07:14
    jordeu commented #2218
  • Sep 22 22:39
    kern3020 opened #2342
  • Sep 22 17:51
    birnbera commented #2218
  • Sep 22 17:23
    unode synchronize #2340
  • Sep 22 17:23
    unode synchronize #2341
  • Sep 22 17:22
    unode opened #2341
  • Sep 22 17:10
    unode opened #2340
  • Sep 22 09:44
    stale[bot] unlabeled #1937
  • Sep 22 09:44
    serverhorror commented #1937
  • Sep 22 09:02
    olifly synchronize #1389
  • Sep 22 08:50
    abhi18av labeled #2285
  • Sep 22 08:50
    abhi18av synchronize #2285
Kevin Brick
@kevbrick
Obviously, immediately after posting, I found the solution :). Seems that process.maxForks defaulted to 1. Adding process.maxForks = 10 to my config remedied the issue. New config is:
process.executor = 'slurm'
process.maxForks = 10
poonchilam
@chilampoon

I want to create directories for each sample containing one or multiple fastq files:

samplesheet:

sample, fastq
A, A_run1.fastq.gz
A, A_run2.fastq.gz
B, B.fastq.gz

then one input channel would be a directory with the two A fastqs, and another channel contains a directory with the B fastq.
after that it's to do the parallel processing for those fastq directories.
How can I do this? THanks.

Bill Flynn
@wflynny

I have a groovy function that takes some considerable time to compute---it looks like this always runs using the local executor regardless of what process.executor is set to. I'd like to force this job to run as a slurm job, for example.

#!/usr/bin/env nextflow

include { read_stats } from './functions.nf'

process COUNT_READS {
    tag "$record.output_id"
    time '30m'
    executor 'slurm'

    input:
      val record
    output:
      val new_record

    exec:
    new_record = read_stats(record)
}

Using exec:or script: in the process doesn't seem to change the behavior. Any advice?

6 replies

Similar question: any way I can use native execution to dynamically generate a file in a process's work dir? Currently I'm using something like the following, but it's ugly.

process foo {
    input:
      val record

    script:
    samplesheet_content = construct_samplesheet_content(record)
    """
    samplesheet_file=${record.output_id}.csv
    cat <<-EOF > \$samplesheet_file
$samplesheet_content
EOF
    tool --input \$samplesheet_file
    """
}

All my attempts to generate the file prior to the triple string lead to the file being opened in the submissions dir/project dir rather than the process workdir.

1 reply
daudn
@daudn

Quick question.

I have:

Output:
    file("*.png") optional true into png_output
    file("*.html") optional true into html_output

The process takes input from all different files and concatenates the results. These two are optional outputs.

How do I make the inputs optional?

Still trying to figure out how to make input optional.

7 replies
cwoehle
@cwoehle
Hi all, I'm trying to define different profiles and find it difficult to not violate the rule to NOT set attributes in the same scope both inside and outside a profiles context as mentioned in the Nextflow docs in a Danger statement. I also think that some public pipelines may not always follow this recommendation 100%. So I'm wondering, if breaking this rule directly leads to unexpected behavior/errors or if it is more a matter of advised programming practice? I did not find a clear explanation in the nextflow docs and would be thankful for any clarification.
3 replies
awgymer
@awgymer

I'm looking for the cleanest way to batch txt files into groups of 100 and I thought maybe I could use .collate with a path channel but then I realised I don't know how to generate a number for each resultant batch file ( $i below). I thought something like the following but I can't quite get there.

Any help appreciated

text_files = channel.fromPath(params.in + '/*.txt', type='file')

process CAT_TXT {
    input:
    path text_file

    output:
    path '*.txt', emit: batched_txt

    script:
    """
    cat $text_file > batch_$i.txt
    """
} 

workflow {
    CAT_TXT(text_files.collate(100))
}
5 replies
mjr1ch
@mjr1ch
How can I use more than one publishDir
1 reply
not sure how I can pick which one in my script. I want to make a error folder and have only pipeline failures get directed there
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
I have a very bizarre situation. I am running a containerized process through nextflow that seems to cap at about 7% cpu per core. But when I get the actual docker run command using this technique: https://stackoverflow.com/questions/32758793/how-to-show-the-run-command-of-a-docker-container. It runs at 100% cpu per core as expected
1 reply
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
This is on nextflow 21.04.3 build 5560
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
The process creates a lot of temp files, so is this some sort of issue with nextflow watching them?
Laurence E. Bernstein
@lebernstein

I want to load some parameters from a JSON file and the code to do so is quite easy.

      File testJsonFile = new File(inputJson)
      JsonSlurper slurper = new JsonSlurper()
      slurper.parseText(testJsonFile.text).each { k, v -> params[k] = v }

However, I would like to encapsulate this code for use in all my workflows. How do I do this? I've tried a lot of different things but if I put the code in another file, the params are not scoped to the main workflow. If I create another workflow and try to pass the params out, they are in channels not variables and can't be assigned properly (Someone was talking about the inability to convert using getVal() in another post, which is essentially the problem).
I also get a list of samples that I convert to a Channel, but that works just fine since passing channels is no problem.
This might be possible using a groovy method but I wasn't able to get that to link into my workflow properly.
Any thoughts from the experts?

3 replies
KyleStiers
@KyleStiers
Is it possible to use a closure in a shell block (this doesn't work as is, but is what I want basically)? Or perhaps you can save the closure to a variable before the shell block? This type of collect works wonderfully in script blocks, but unfortunately I really need this to function in a shell block.
i.e.
process foo {
echo true

input:
tuple val(x), val(y) from test_ch.collect()

shell:
"""
necessary_shell_command  --option "!{y.collect{"$it,"}.join()}"
"""
}
2 replies
Sam Birch
@hotplot
For future reference, if anybody is getting Access to '<process>.out' is undefined since process doesn't declare any outputfor a process that does declare outputs, it might be because you have a comma after the end of the previous process in your workflow declaration
Sunit Jain
@sunitj

Hello! Possibly a question that's been discussed before, but can someone point me to a solution for concatenating multilane fastq files (single or paired) by sample names? I'm using Nextflow 20 with AWS Batch and DSL1. So far, I have:

if (params.singleEnd) {
        Channel
        .fromPath(params.reads)
        .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}." }
        .map {it -> tuple(params.prefix, it)}
        .groupTuple()
        .set { ch_reads_concat }
    } else {
        Channel
        .fromFilePairs(params.reads, flat:true)
        .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}." }
        .map {it -> tuple(params.prefix, it)}
        .groupTuple()
        .set { ch_reads_concat }
    }

Unfortunately, this returns values in the channel of the form:

[ SN1, 
  [ SN1_L1_R1.fq.gz,  SN1_L2_R1.fq.gz,  SN1_L3_R1.fq.gz ], 
  [ SN1_L1_R2.fq.gz,  SN1_L2_R2.fq.gz,  SN1_L3_R2.fq.gz ]
]

I'm not sure how to handle this in my process and do I need separate processes for SE and PE?

Sunit Jain
@sunitj
I'm sorry, I meant, of the form:
[SN1, 
  [
    [SN1_L1, SN1_L1_R1.fastq.gz, SN1_L1_R2.fastq.gz],  # Lane 1
    [SN1_L2, SN1_L2_R1.fastq.gz, SN1_L2_R2.fastq.gz],  # Lane 2
    [SN1_L3, SN1_L3_R1.fastq.gz, SN1_L3_R2.fastq.gz]   # Lane 3
  ]
]
2 replies
Jack Morrice
@jackmo375

Hi everyone, I have a short question about writing values in the the script context that can be passed out through channels. For example I am trying to do something like this:

process test {
    output:
        val x
    script:
        x = 0
        """
#!/usr/bin/env python
x = 137  # update x to something
        """
}

How can I get this to pass 137 through the output channel?

3 replies
KyleStiers
@KyleStiers

I feel like I asked this before, but I can't seem to find if it ever got answered. I am running a process with the shell block. I need to save the result of a shell (bash) command to a variable so that I can dynamically place it in a text file. I can't seem to sort out how to get the result of the command to be stored as a bash variable AND use that in another shell command in the shell block of Nextflow.

It looks something like this:

process foo {

shell:
"""
        link=`curl -s url --data "path/to/data/" | jq -r .ocs.data.url`
        sed s:#LINK:${link}:g > content.txt
"""
}
11 replies
Sam Birch
@hotplot
Does Nextflow not cache processes that specify no inputs?
Sam Birch
@hotplot
Nevermind, it was a resume issue
Jemma Nelson
@fwip
I'm having the issue where "jobs stuck in RUNNABLE" in AWS Batch, when following the tutorial here: https://genomics-nf.workshop.aws/nf101.html . I've followed the instructions on the AWS debugging page (https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/) with no luck. The only remaining clue I have is that this warning appears on the job's detail page: screenshot of "configuration conflict" warning message
Jemma Nelson
@fwip
I didn't set any cpu/memory configurations in the configuration. For reference, this is the script file used in the tutorial: https://github.com/seqeralabs/nextflow-tutorial/blob/master/script7.nf
2 replies
Jemma Nelson
@fwip
Are there any reference documents I could refer to? It would be really nice to enable our users to "burst" to AWS when our on-premises cluster is backlogged.
Bill Flynn
@wflynny

Is there a good way to dump all workflow metadata to file? Something like:

import groovy.json.*

workflow_json = new JsonBuilder(workflow).toString()

yields a StackOverflowError. Any suggestions?

Vivek Rai
@raivivek
Hi everyone! I'm trying to use each keyword with the new DSL2 syntax but seem to be running into this error. I'm not sure what seems to the issue:
nextflow.enable.dsl = 2

test1 = Channel.fromList(["1", "2", "3"])
test2 = Channel.fromList(["a", "b"])

workflow {
    test(test1, test2)
}

process test {
    echo true

    input:
        val(x)
        each val(y)

    "echo $x + $y"
}
This gives the error that No such variable: y
Vivek Rai
@raivivek
Okay, so after a little bit of exploration, it appears that changing val(y) to just y does the trick.
Jemma Nelson
@fwip
Anyone with AWS experience know if the above "Configuration conflict" warnings are normal/expected?
Jemma Nelson
@fwip

for instance, maybe how nextflow communicates with AWS Batch is just a little outdated, but still works fine?

I should be in contact with an AWS person in the coming week or two, so I'll try to remember to follow up here if I find a resoultion.

daudn
@daudn

I am looking into saving a WF report alongside other files (generated by the WF)

I know there are two ways to do this. with-report /path/to/file or then add report {} scope in my nextflow.config- I was wondering is there anyway to add this to the end of the WF.

Something like

workflow.onComplete{
    #genearte and save report
    #to dynamic location (based on variables from the WF)
}

I only want a final report if the WF completes, and would like to save it to a dynamic location hence the above two methods don’t fit the use-case.

Thank you! Any help greatly appreciated! :)

2 replies
Thomas A. Christensen II
@millironx:matrix.org
[m]

Is there a way to get the memory allocated to a process the same way one can get the cpus allocated to a process?

java -Xmx${task.memory} -jar example.jar -threads ${task.cpus}

results in

java -Xmxnull -jar example.jar -threads 12
3 replies
Egan Lohman
@eganlohman

How do I execute a process per file path in dsl2? I.e. the dsl2 equivalent to this:

params.inputs = "$baseDir/data/reads/*_1.fq.gz"

Channel.fromPath(params.inputs, checkIfExists: true).set{ samples_ch }

process foo {
  echo true
  input:
  file x from samples_ch

  script:
  """
  echo your_command --input $x
  """
}

I've tried the following, but get: groovyx.gpars.dataflow.expression.DataflowInvocationExpression cannot be cast to java.nio.file.FileSystem
// My main.nf

nextflow.enable.dsl=2

include { analysis } from './modules/analysis.nf'
include { configuration } from './modules/configuration.nf' 

workflow {

    configuration(file(params.sample_sheet))

    Channel.fromPath(configuration.out.analysis_node_inputs).set{ analysis_inputs }
    analysis(params.analysis_folder, params.fastq_folder)
}

// My analysis.nf process

nextflow.enable.dsl=2

process analysis {

    label 'tso500CtDna'
    maxForks 1

    input:
        file analysis_input from analysis_inputs
        path analysis_folder
        path fastq_folder

    script:
        """
            ...
        """  
}
1 reply
Michael L Heuer
@heuermh
Hello @aloraine! Some of those are nf-core-specific questions, you may wish to check out the nf-core #rnaseq Slack channel, invite link https://nf-co.re/join/slack
A. Loraine
@aloraine
Thanks @heuermh! I just now posted question to their slack.
philDTU
@philDTU
How can is use a tuple inside of a workflow environment?
workflow run_it{
    take:
        IM_A_TUPLE
    main:
       tuple val(PID), val(SAMPLE), val(CONTROL), val(SEQ_TYPE)  from IM_A_TUPLE
1 reply
Arnaud Ceol
@arnaudceol
Hi! I have a channel problem to submit: I need to process all files with extension .tar.gz in a folder structure organized like /somepath.../<group>/<user>/
in teh process, I need to be aware of the <group> and <user>, because two tar.gz with the same name may be present in the folders of different users.
I tries the following: Channel.fromPath( '/somepath .../.tar.gz' ) , and Channel.fromPath( '/somepath/.tar.gz' ).map { file -> tuple(file.baseName, file) }, but I cannot retrieve the full path (the one including group and user).
Any suggestion? Thanks in advance!
Arnaud Ceol
@arnaudceol
sorry the stars disappeared, the correct command are Channel.fromPath( '/somepath .../**.tar.gz' ) , and Channel.fromPath( '/somepath/**.tar.gz' ).map { file -> tuple(file.baseName, file) }
2 replies
Nathan Spix
@njspix

Hello all, my NF pipline (DSL2-based) is quitting with the following output:

N E X T F L O W  ~  version 21.04.0
Launching `../template_pipeline.nf` [agitated_borg] - revision: 1e4c39e837
[-        ] process > luo:fastq_qc:fastqc             -
[-        ] process > luo:fastq_qc:multiqc            -
[-        ] process > luo:cutadapt_luo                -
[-        ] process > luo:cutadapt_qc:fastqc          -
[-        ] process > luo:cutadapt_qc:multiqc         -
[-        ] process > luo:align_index:align           -
[-        ] process > luo:align_index:index           -
[-        ] process > luo:align_qc:pileup             -
[-        ] process > luo:align_qc:biscuitqc          -
[-        ] process > luo:align_qc:preseq             -
[-        ] process > luo:align_qc:stats              -
[-        ] process > luo:align_qc:plot_bamstats      -
[-        ] process > luo:align_qc:multiqc            -
[-        ] process > luo:polish_bams:clean_bam       -
[-        ] process > luo:polish_bams:index           -
[-        ] process > luo:feature_coverage:bam_to_bed -
/root

The pipeline completes normally if I comment out the last step in the pipeline (feature_coverage). Any ideas what might be causing this?
The lack of any meaningful error message is perplexing!

1 reply
mbahin
@mbahin
Hi all, I have a very basic question... I have a channel which is composed of 2 elements (got it from a "fromPath" expression), how can I access one of the elements? (like I would do list[0] for the element in a list in other languages for example)
2 replies
paulderaadt
@paulderaadt

I load an lmod python module in my process scope, I have a python script in my workflow bin/ dir and this script has #!/usr/bin/env python3
as shebang, my process crashes with:

  /usr/bin/env: python3
  : No such file or directory

Any tips?

paulderaadt
@paulderaadt
and the solution: windows line endings which broke the shebang.. >:(
Jeffrey Massung
@massung

Is there an easy way for me to specify that an output (path) is also a value channel (e.g. using .first())? Right now if I need a single file output to be used by multiple downstream processes I end up doing something like:

process A {
  output:
    path("foo.txt") into foo
}

process B {
  foo.multiMap {
    B: it
    C: it
  }.set { foo_multi }

  input:
    path foo.foo_multi.B
}

process C {
  input:
    path foo.foo_multi.C
}

But, I dislike that the multiMap is in process B. It'd be better if it was in A - and even better - or defined as a value output in A (e.g. something like path("foo.txt") into foo as value).

zburkett
@zburkett
Hi everyone, is there a way to get the paths for all items that are added with an include in a workflow? For example, if main.nf has an include for a workflow. That workflow has a number of modules that are added via an include and I'm trying to get the paths to those specific modules. I know this can be worked out by parsing the text of the workflow, but was curious if there's some inbuilt functionality that can be used to obtain the same thing.
btyukodi
@btyukodi

Hi Everyone,

I am now to it and just have started to read Nexflow's documentation and found that one can specify a scratch directory for the execution. Once the task is complete, one can use the stageOutMode directive to copy the output files from scratch to storeDir.

The output files to be copied are specified by the output directive. My question is the following: is it possible to specify entire directories as output so that they would be copied recursively from scratch to storeDir? If so, how? Thanks a lot!

2 replies
KyleStiers
@KyleStiers
I have a pipeline that essentially is always running, and I'd like to generate reports, traces, and logs for when a run actually finds things to process only. The majority of the time it doesn't find anything to run on and completes successfully but without having done anything. Is there a clean nextflow-y way to do this that doesn't error out? Or maybe just a check on the starting channel to see if it's empty that kills the job in a way that wouldn't produce these output files?
5 replies
Janet Zhou
@jz940

Hey all I'm having the same issue as #677, where there's a dependency error with downloading Nextflow

CAPSULE: Downloading dependency org.pf4j:pf4j-update:jar:2.3.0
CAPSULE: Downloading dependency commons-codec:commons-codec:jar:1.10
CAPSULE EXCEPTION: Error resolving dependencies. while processing attribute Allow-Snapshots: false (for stack trace, run with -Dcapsule.log=verbose)
Unable to initialize nextflow environment

The complete error output is pretty similar to the issue where it seems to be an issue with java, but I have java 11 (and the issue was resolved 3 yrs ago and doesn't apply here)

CAPSULE EXCEPTION: Error resolving dependencies. while processing attribute Allow-Snapshots: false
java.lang.RuntimeException: Error resolving dependencies.
    at capsule.DependencyManager.resolve0(DependencyManager.java:382)
    at capsule.DependencyManager.resolveDependencies(DependencyManager.java:314)

Anyone run into this?