Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 10:44
    ewels commented #616
  • 09:26
    abhi18av commented #1817
  • 08:47
    replikation commented #1817
  • 08:43
    chbk opened #1823
  • 08:18
    abhi18av commented #1817
  • 04:10
    abhi18av commented #1821
  • Dec 02 22:27
    RichardCorbett commented #1821
  • Dec 02 21:48
    RichardCorbett commented #1821
  • Dec 02 21:26
    RichardCorbett commented #1821
  • Dec 02 20:49
    RichardCorbett commented #1821
  • Dec 02 19:54
    stale[bot] labeled #1659
  • Dec 02 19:54
    stale[bot] commented #1659
  • Dec 02 17:31
    vinjana opened #1822
  • Dec 02 17:04
    RichardCorbett opened #1821
  • Dec 02 17:00
    robsyme commented #804
  • Dec 02 16:59
    grst opened #1820
  • Dec 02 16:52
    grst opened #1819
  • Dec 02 16:49
    grst opened #1818
  • Dec 02 16:10
    replikation opened #1817
  • Dec 02 14:58
    alaincoletta opened #1816
Paolo Di Tommaso
@pditommaso
if you find out, tell us! :satisfied:
arontommi
@arontommi
Hi!
is there any way of running nextflow 20.10 and later offline ?
hkim
@wisekh6
Hi, the following (using ch_input and ifEmpty channel) prints out only the 1st elementX 1 from ch_input, but I would like to make it print out all of ch_input. Would you please help?
#!/usr/bin/env nextflow

nextflow.enable.dsl=2

Channel.from(   ['X', 1], 
                ['Y', 2], 
                ['Z', 3])
    .set{ ch_input }

ch_empty = Channel.empty()

process foo {
  echo true

  input:
    tuple val(a), val(b)
    val(is_empty)

  when:
    is_empty == 'EMPTY'

  script:
  """
  echo ${a} ${b}
  """
}


workflow {

    foo(ch_input, 
        ch_empty.ifEmpty('EMPTY'))

}
The current output is
N E X T F L O W  ~  version 20.07.1
Launching `empty_ch.nf.sh` [grave_poincare] - revision: ffb2150e56
executor >  local (1)
[6d/538d2c] process > foo (1) [100%] 1 of 1 ✔
X 1
Jemma Nelson
@fwip
I ended up going with solution B. If there's a better method, please let me know.
namiller2015
@namiller2015

Hi all,
I am new to nextflow and have been trying to work through this tutorial
using a free EC2 linux instance on AWS.
https://github.com/seqeralabs/nextflow-tutorial
However, on step 2 when I try to run the docker image through nextflow I am getting an error because the files created by docker do not have permission to execute. This is described in this bug report
nextflow-io/nextflow#1295

I have tried the solution mentioned there to run the container as root but didn't have any luck. I'd like to solve this without running as root as I'd like to eventually use nextflow and docker in a production environment.

Any advice would be appreciated. Thanks!

1 reply
Johnathan D.
@bjohnnyd
Is there an idiomatic way or even a way to express situations where the output of a rule is an input to itself for some specific N times? For example, a process to call variants and then using the output variant calls as an additional input (a DB of variants) to the next iteration of the same process. And to this iteratively up to some sample/timepoint using the most recent output as an input DB?
Johnathan D.
@bjohnnyd
Essentially, a logic similar to this:
all_inputs = (sample_name, timepoint, bam) | sort { it[1] }
db = all_inputs | map {it[0,2]} | first | callVariants
results_channel = Channel.empty()
results_channel.push(db)

for (sample_name, timepoint, bam) in skip_first(all_inputs) {
    result = CallVariants(sample_name, bam, db)
    results_channel.push(db)
    db = result.out.vcf
}

results_channel | ...
Luca Cozzuto
@lucacozzuto
so you are asking for a for loop
I remember @pditommaso shouting at me when I asked that some year ago :)
6 replies
Steven P. Vensko II
@spvensko
I resolved my AWS Batch issue from a few days ago -- it appears Alpine-based images do not play well with Nextflow and Batch.
reganhayward
@reganhayward
Hi there - I'm using Nextflow for a dualrnaseq pipeline (https://github.com/nf-core/dualrnaseq). Everything runs fine and I get the pipeline completed message, but after each run I get an warning message about some files that weren't able to be published - but when I look they exist in the /work directory and in the final output. I've got log files etc, but I'll keep this first message short.
Nick Swainston
@NickSwainston
Is it possible to have a DSL2 process where the output is a tuple of files with the 'includeInputs true' option? I have tried the following ways, without success
output:
 tuple file("*${label}.yaml"), file("*fits") includeInputs true
 tuple file("*${label}.yaml"), file("*fits"), includeInputs true
 tuple file("*${label}.yaml"), file("*fits", includeInputs true)
Paolo Di Tommaso
@pditommaso
output:
 tuple path("*${label}.yaml"), path("*fits"), includeInputs: true
RenzoTale88
@RenzoTale88
Hi, I'm writing a workflow on using the DSL2 in nextflow. My question is quite simple: does anyone know how to parse the stdout of a process into a channel and process one line at a time? For example, assume the stdout is a list of chromosome IDs to extract from a vcf, and the next process will create a sub-vcf for each chromosome and process these separately. Is there an easy way to do this? Thanks in advance to anyone for the help! :)
Raoul J.P. Bonnal
@helios
@pditommaso I read a tweet about nextflow plugins. Does it mean that we can include the database libraries as plugin for those who want to enable the fromSQL channel ?
1 reply
Tuur Muyldermans
@tmuylder
Hi all! Does someone have experience with running STAR in Nextflow? Indexing works out perfectly (inspired as well on https://github.com/CRG-CNAG/CalliNGS-NF/), but for some reason nf is complaining about could not open readFilesIn, however .command.sh clearly uses the correct reads and they are present (linked) in the work directory. Shot in the dark here, but I can share more ofc.
Raoul J.P. Bonnal
@helios
@tmuylder can you read the file from within the work directory ? Are you using it w/containers ? If yes, from the container can you read the reads ?
3 replies
Robert Syme
@robsyme
Hi all.
I'm passing a custom groovy object into a process, which is causing issues with the resume function. Even if the groovy object is constructed in the same way each time, the hashing function produces a different output. Is there a way I can tell nextflow how the object should be hashed?
Robert Syme
@robsyme

I think the key is in the hasher function [1]. If the object being hashed doesn't fit one of the pre-deterimed types (Integer, Map, File, etc), then it's hash is computed from value. hashCode(). I can override this method and implement my own. Doing this does indeed give nice, predictable hashes for my object, but then nextflow prints a warning that "WARN: [Process:CreateSampleSheetLaneSplit (H3V5LDSXY)] Unable to resume cached task -- See log file for details".
The log doesn't contain any extra information, though.

[1] https://github.com/nextflow-io/nextflow/blob/553c3119cc81012c3c4667ba280f174b0a7c7f8b/modules/nf-commons/src/main/nextflow/util/CacheHelper.java#L104-L196

Robert Syme
@robsyme
I've opened up an issue that describes the behaviour I'm seeing: nextflow-io/nextflow#1811
Paolo Di Tommaso
@pditommaso
it may be solved having your class to implement both CacheFunner and java Serializable interfaces
Robert Syme
@robsyme
Good suggestion Paolo, but unfortunately, no dice. I've updated the example repo and will update the issue to reflect this attempt.
Richard Corbett
@RichardCorbett

Hi folks. I'm switching to DSL2 and loving it! Does anyone know how to use the workflow.onComplete function in DSL2? In non-DSL2 scripts I would keep this at the bottom:

workflow.onComplete {
    final_results_ch.view{ println "[complete] $it"}
    println "Pipeline $workflow.scriptName completed at: $workflow.complete"
    println "Execution status: ${ workflow.success ? 'OK' : 'failed' }"
}

but when I include that block after my main workflow in DSL2 I get a Failed to invoke `workflow.onComplete` event handler error.

3 replies
Feroze Fazaludeen
@ferozef
image.png
image.png

Hi, I'm new to this place and bioinformatics, I have RNA seq dataset i'm trying to analysis . I came across the Nextflow pipeline ran once, it went well, but this time its give an error "Input samplesheet not specificied". In my bash file I have mentioned but still it give error. can anyone please advise what should be done and to check

!/bin/bash

/research/groups/malmgroup/Feroze/nextflow/nextflow run nf-core/rnaseq --singleEnd --reads "/research/groups/malmgroup/Feroze/microglia_transcriptome_Vs_durg_interaction/Dataset_collected_by_ahmed_151120/Homeostatic/homeostatic/data_set/*" --genome GRCm38 -profile singularity --forwardStranded --clip_r1 8 --max_memory "50.GB" --max_cpus "20" -name "mouse_homeostatic_dataset_3" --outdir "/research/groups/malmgroup/Feroze/microglia_transcriptome_Vs_durg_interaction/Dataset_collected_by_ahmed_151120/Homeostatic/homeostatic"

image.png
RavinPoudel
@raveenpoudel_twitter

Running co-assembly with megahit:

Below is the output from one of the channel that I want to pass to megahit for co-assembly. Rather than one sample at a time, for co-assembly list of forward reads and list of reverse reads need to be passed. How can we convert channel to two lists, one with forward and other with reverse? or Any suggestions on how to pass output from a channel for a co-assembly ?

Following is channel.view() output:
['R99-H7-Death-15-G04-CTTCCAAC_sub', /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/43/508e9fc49d1d091314c81f0d5b0510/R99-H7-Death-15-G04-CTTCCAAC_sub.trimmed_1.00.0_0.cor.fastq.gz, /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/43/508e9fc49d1d091314c81f0d5b0510/R99-H7-Death-15-G04-CTTCCAAC_sub.trimmed_2.00.0_0.cor.fastq.gz, 'R99-H7-Death-18-B05-CTGTACAG_sub', /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/8f/9c45b8f645e742c51220b224fa28cf/R99-H7-Death-18-B05-CTGTACAG_sub.trimmed_1.00.0_0.cor.fastq.gz, /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/8f/9c45b8f645e742c51220b224fa28cf/R99-H7-Death-18-B05-CTGTACAG_sub.trimmed_2.00.0_0.cor.fastq.gz

# script for co-assembly:
time megahit --k-min 27 --k-max 127 --k-step 10 -m 0.98 -t 12 --min-count 2 --out-dir megahit_output --kmin-1pass --min-contig-len 300 -1 /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/43/508e9fc49d1d091314c81f0d5b0510/R99-H7-Death-15-G04-CTTCCAAC_sub.trimmed_1.00.0_0.cor.fastq.gz, /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/8f/9c45b8f645e742c51220b224fa28cf/R99-H7-Death-18-B05-CTGTACAG_sub.trimmed_1.00.0_0.cor.fastq.gz -2 /project/gbru_fy21_tomato_ralstonia/nextflowRun/work/43/508e9fc49d1d091314c81f0d5b0510/R99-H7-Death-15-G04-CTTCCAAC_sub.trimmed_2.00.0_0.cor.fastq.gz,/project/gbru_fy21_tomato_ralstonia/nextflowRun/work/8f/9c45b8f645e742c51220b224fa28cf/R99-H7-Death-18-B05-CTGTACAG_sub.trimmed_2.00.0_0.cor.fastq.gz

# More generic syntax:

time megahit --k-min 27 --k-max 127 --k-step 10 -m 0.98 -t 12 --min-count 2 --out-dir megahit_output --kmin-1pass --min-contig-len 300 - 1 sample_1_R1.fastq.gz, sample_2_R1.fastq.gz -2 sample_1_R2.fastq.gz, sample_2_R2.fastq.gz

```

2 replies
Robin F Chan
@robinfchan

Hey guys, hoping I'm just missing something -- I'm trying to setup a Nextflow pipe on AWS Batch that depends on a tool that requires an absolute path as a parameter. I'm trying to figure out how retrieve the relevant path within the actual process instance. For example I'm trying to process:

s3://my-bucket/data/sample777/file.fastq

it is staged for the process instance at:
s3://my-bucket/nextflow_runs/57/work/57/2f5a4e1d6db353f2734020ba7ea340/file.fastq

How can I return the updated absolute path for the staged file so that I can feed it into the command line tool in the script block??

Eg I want

input:
path my_fastq from ch_fastq

script:
"""
echo "${XXX}/${my_fastq}"
"""

To spit out s3://my-bucket/nextflow_runs/57/work/57/2f5a4e1d6db353f2734020ba7ea340/file.fastq instead of just file.fastq

Running locally its obvious nextflow just creates symlinks to the original files but it's unclear how to achieve the above in AWS Batch. Any ideas? Trying to avoid having to move data off S3.

Steven P. Vensko II
@spvensko_gitlab
echo "\${PWD}/${my_fastqs}" may work.
Robin F Chan
@robinfchan
@spvensko_gitlab Ahhhh thank you. It was the escape char before ${PWD}
Steven P. Vensko II
@spvensko_gitlab
No worries, glad it worked out. :+1:
ink-blot
@ink-blot

I have a process that creates several *.list files from an input file.

process createIntervals1{


    input:
        file(input_bam) from bam_for_intervals        
    output:
        file("intervals/*.list") into intz_ch
    script:

        """
        perl perl_intervals.plx  -i ${input_bam} -o 'intervals/'  

        """

}

In the next process, I want to wait for all *list files to be generated from the previous process. Then I want to do some operations on each list file. Then I want to scatter the list files for subsequent processing.

process gatherIntervalsandScatter{

input:
    file(intaz) from intz_ch.collect()
output:
    file("collected/*.list") into intz_ch3
script:

    """

        mkdir collected
        cp *.list collected/

    """

}

I want to process the output channel of the previous process using a separate process for each individual file.

process processIntervalsIndividually{

input:
    file(intaz2) from intz_ch3
output:

    file("collected2/*.list") into intz_ch4    

script:

    """

    echo ${intaz2}
    mkdir collected2
    cp *.list collected2/    

    """

}

Unfortunately, the final process is receiving a list of all files at once and only a single process is executed. Accordingly, the line "echo ${intaz2}" in the final process outputs something like:

1.list 2.list 3.list 4.list 5.list
#

How can I modify my approach so that a separate process is executed for each individual '.list' file???????????

#

If I scatter the files from a file system path using something like:

Channel.fromPath("/scratch/intervals/*.list").set{intz_ch3}

and then output it into the final process "processIntervalsIndividually" I get the desired behaviour where 5 different processes are executed. I would like a similar thing to occur when just using the temporary output from the preceding process.

Nick Swainston
@NickSwainston

Is there a way to put the working directory of a process into the .command.sh. So something like this:

process print_pwd {
    """
    echo \$pwd
    """
}

Will output something like this to the .command.sh

echo /home/user/work/aa/hashksjfklasjdfklasd

I also don't want this to print the scratch directory if I'm using temporary storage

Tuur Muyldermans
@tmuylder
Hi all, I'm using a Docker container while running my Nextflow pipelines (defined in the process or in config file). I'm struggling with the issue that the data can't be accessed in the process. I'm guessing that the data needs to be mapped in the Docker container and for some reason this is not working. The files are all present in the work-dir of the process, but is hyperlinked and I guess that the folders with the actual data are not passed on to the container. Any ideas? (for processes like fastqc and trimmomatic it is working, however not for the STAR alignment)
Paolo Di Tommaso
@pditommaso
NF automatically bind required data volumes as long as the required data is specified as input: file or input: path (better)
chbk
@chbk
Are there any plans to let nextflow mount more than one volume when running on a kubernetes cluster? As of now you can only specify one storageClaimName in the config file.
9 replies
Lumimar
@Lumimar

Hi all, I am trying to use the output of .simpleName (within the script section) and use it in publishDir, but although .simpleName works it seems that the variable generated is not available to publishDir, as I get a null value warning. Any idea how I can fix this to make the variable available to publishDir? Many thanks!
```process foo {

publishDir "$params.outdir/$patient_code/", pattern: '*.mpileup', mode: 'copy' // "publishDir path contains a variable with a null value"

input:
val i from BamFile

output:
file '*.mpileup' into R_mpileup

script:
def patient_code=i.simpleName   // this works, but is not picked up by publishDir
"""
bash SNPs.sh  "$i" 
"""

}
```

Lorena Pantano
@lpantano
does somebody knows what happens with the jobs that are affected by the docker limit rate? I am running in Batch, jobs exited with 0 after the error of not able to pull the container and nextflow doesn't complain. Just moves on. Are those jobs lost? if the command really doesn't run, nextflow should fail right?
Paolo Di Tommaso
@pditommaso
I guess should fail
move those containers to quay.io
Lorena Pantano
@lpantano
ok, thanks!
Jacques Dainat
@Juke34
Hi, I'm looking for information to understand how Nextflow deals with parallelization and cpu usage when we do not use any scheduler. Any information somewhere?
Jacques Dainat
@Juke34
e.g. Let's say I use computer with 48 CPU, if I run a job that parallelize 10 jobs/process that use 10 cpu each, will Nextflow see that it needs 100 cpu but only 48 are available (will it hold some jobs until cpus becomes available)? Will it behave as a scheduler (can it replace a scheduler)?
Do tasks spawn in parallel ?