Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 21 14:16
    pditommaso commented #1477
  • May 21 13:29

    pditommaso on master

    Remove unnecessary change dir e… (compare)

  • May 21 11:23
    jorgeaguileraseqera synchronize #2853
  • May 21 11:23

    jorgeaguileraseqera on 2845-add-support-for-labeltags

    feature: add support for label… (compare)

  • May 21 08:48
    jorgeaguileraseqera synchronize #2853
  • May 21 08:48

    jorgeaguileraseqera on 2845-add-support-for-labeltags

    feature: add support for label/… feature: add support for label/… feature: add support for label/… and 3 more (compare)

  • May 21 06:14
    Kobzol commented #2867
  • May 21 06:06
    pditommaso commented #2751
  • May 21 05:32
    pditommaso commented #2867
  • May 20 23:04
    dougnukem commented #2853
  • May 20 22:57
    xhejtman commented #2751
  • May 20 22:01
    Kobzol commented #2867
  • May 20 21:54
    spirali commented #2867
  • May 20 19:34
    pditommaso commented #2751
  • May 20 19:15
    pditommaso commented #2751
  • May 20 17:45

    pditommaso on 2881-if-nxf_offline-is-a-non-empty-string-then-offline-mode-is-enabled

    (compare)

  • May 20 17:44
    pditommaso commented #2885
  • May 20 17:44
    pditommaso closed #2885
  • May 20 17:42
    pditommaso closed #2881
  • May 20 17:42
    pditommaso commented #2881
John Ma
@JohnMCMa

Is it possible to create files with the native execution mode of a process? For example, I attempted the following:

process WRITE_FASTP_METRICS{
    input:
        val (rna_result)
        val (adt_result)
    output:
        path "fastp_metrics.csv"
    exec:
        write_out = file("fastp_metrics.csv")
        rna_result.forEach{key, value ->
            write_out << key << ',' << value << '\n'
        }
        adt_result.forEach{key, value ->
            write_out << key << ',' << value << '\n'
        }
}

But the fastp_metrics.csv is not created in the work directory, causing this error: Missing output file(s) ``fastp_metrics.csv`` expected by process ``WRITE_FASTP_METRICS (1)``

anoronh4
@anoronh4

i'm wondering if we can pass in a container as a variable, as i want to test the same process over various versions of a software. something like this:

process A {
container= container_label

input:
tuple val(container_label), path(inputFile)
...
}

this code did not work, however. can it be done in another way?

rpetit3
@rpetit3:matrix.org
[m]
You could make a parameter to do it at run time, something like
process A {
container= params.container_label

input:
path(inputFile)
...
}
anoronh4
@anoronh4
isn't that still the same issue? params.container_label is just one value, i still want the input channels to affect the container directive
1 reply
i think i got it:
process A {
input:
tuple val(container_label), path(inputFile)
...
script:
task.container = container_label
...
}
rpetit3
@rpetit3:matrix.org
[m]
let us know if that works!
anoronh4
@anoronh4
@rpetit3:matrix.org it does!
rpetit3
@rpetit3:matrix.org
[m]
nice to know! thanks for sharing
Luca Cozzuto
@lucacozzuto
Mmm why not passing it as a parameter? I'm passing a number of things in my workflows:
emily-kawabata
@emily-kawabata
Hi everyone,
Does anyone know if there will be any nextflow workshop in the near future? I see that there was one in July of 2020 and another one in May of this year hosted by ecseq and was wondering if anyone knows if a similar event will be taking place in the future.
2 replies
9d0cd7d2
@9d0cd7d2:matrix.org
[m]
Hi all! I'm very interested on the tool as seems that covers a lot of the integrations that we need to a particular project (Slurm, buckets, Singularity, etc), but my worries are that our project is related mostly on CFD workflows and a small part on AI, and aparently Nextflow seems quite relatated with bio and genomics workflows. Dou you think that we can use it anyway?
5 replies
xmzhuo
@xmzhuo
Hey All,
For azurebatch, is it possible to define two pools type (with autoScale for different vmType) in autoPoolMode?
Ghost
@ghost~61847cca6da037398489d4e6
In watchPath, is it ok to use wildcard in directory and in file name at the same time? For example: watchPath('/myfolder/*/logs/*.log', 'create'). It doesn't seem to work for me.
zhemingfan
@zhemingfan
Hi everyone, I'm relatively new to Nextflow. For the following code, I'm getting an error where I'm unable to retrieve the index file ([E::idx_find_and_load] Could not retrieve index file for 'merged_sorted.vcf.gz') even though the folder points to the correct path, and running this command normally outside of Nextflow works fine. Would anyone happen how to fix this?
    process generate_readset {
tag "$sample_id" 
cpus 48
input:
    tuple val(read_name), val(chromosome1), val(chromosome2), val(cuteSV_pos1), val(cuteSV_pos2), 
    val(sniffle_pos1), val(sniffle_pos2), 
    path(cuteSV_vcf), path(sniffles_vcf) from vcf_input
output:
    path 'complete_read_set.txt' into receiver
script:
"""
${bcftools_1_11} view --threads ${task.cpus} $cuteSV_vcf -r chr$chromosome1:$cuteSV_pos1-$cuteSV_pos2 > complete.txt
"""}
Ghost
@ghost~61847cca6da037398489d4e6
@zhemingfan You need stage the vcf idx file. Add the idx file as input would solve your problem. In addition, your output file name has to be complete_read_set.txt.
Paul Cantalupo
@pcantalupo
I'm having problems pulling a github repo with nextflow. The repo is private and part of an Organization of which I am an Owner. I created an SCM file with my personal username and am able to pull personal private repos with nextflow pull. But when I try to pull a private Organizational repo, I get the following: Remote resource not found: https://api.github.com/repos/PATH/TO/contents/main.nf. What am I doing wrong?
xmzhuo
@xmzhuo

Hey All,
I have a error relate to nextflow azurebatch. The first process using a default D4_v3 vm work alright, but the second process I fail to request a larger vm (I set it in queue, but apparently, it is not working, do I make some naive mistake?)
'''

Error executing process > 'secondprocess'

Caused by:
Cannot find a VM for task 'secondprocess' matching this requirements: type=Standard_D4_v3, cpus=16, mem=14 GB, location=eastus
'''

The config file I used:

process {
       executor = 'azurebatch'
}

docker {
    enabled = true
}

azure {
  batch {
    location = 'eastus'
    accountName = 'xxxbatch'
    accountKey = 'xxx'
    autoPoolMode = true
    allowPoolCreation = true 
    deletePoolsOnCompletion = true 
    deleteJobsOnCompletion = true 
    pools {
        small { 
            autoScale = true
            vmType = 'Standard_D4_v3'
            vmCount = 5
            maxVmCount = 50
        }    
        large { 
            autoScale = true
            vmType = 'Standard_D16_v3'
            vmCount = 5
            maxVmCount = 50
        }    
    }
  }
  storage {
    accountName = "xxx"
    accountKey = "xxx"
  }
}

process {
    withName: firstprocess {
        queue = 'small'
    }
   withName: secondprocess {
        queue = 'large'
    }
}
Greg Gavelis Code Portfolio
@ggavelis
Hi all, Simple question: I'm trying to create a channel from a list of tuples. val(sample_id) and path(input_file). Though I've check that the input file exists I am encountering errors ['path value cannot be null.'] Does anyone know what is wrong with my syntax? https://pastebin.com/YbSG28a3. BTW: Are tuples even the best approach for this? Since the sample_id string is embedded in the path to the input_file, shouldn't there be a way to extract the sample_id from file_path?
3 replies
Hugues Fontenelle
@huguesfontenelle
Hello @pditommaso and friends :-)
I'm wondering, is there any reason with the cpus directive is not used with the local executor? (using the --cpus docker params) After all, the Docker param --memory is used.
10 replies
Matteo Schiavinato
@MatteoSchiavinato
I have a question that perhaps it isn't very complicated to answer: is it possible to pass to nextflow a dataframe of input files / information to read (e.g. one sample per line) instead of passing paths in multiple channels and then combining them in a single tuple? I was surprised not to find much about it online, so I wondered if the option existed or not.
4 replies
Bede Constantinides
@bede
Hi all, I'm using a shared system where homedirs have quotas, and keep getting disk quota errors with nextflow run, despite specifying an unquota'd path with -w aka -work-dir. Any ideas? In this thread, Paolo suggests -w is the solution… https://groups.google.com/g/nextflow/c/401Tp_6H57k/m/va8ACNeTAQAJ
2 replies
$ nextflow run nf-core/viralrecon -w /users/xxx/test --help
N E X T F L O W  ~  version 21.04.0
Pulling nf-core/viralrecon ...
Disk quota exceeded
Bede Constantinides
@bede
Solution: set NXF_ASSETS to a path without a quota
Ignacio Eguinoa
@ieguinoa
Hi all, I was wondering if there is any resource to help parsing a nextflow file/s into a tree of the syntax elements (AST), ideally using Python. I'm working on a small tool that tries to parse some metadata from the process, channels, etc and need to parse the workflow definition into objects.
similar to what the javalang packages does in python to parse generic Java code (https://pypi.org/project/javalang/)
George Seed
@georgeseed_twitter
So, I've got a channel that reads my samplesheet and grabs the second column, and counts how many unique elements it has:
lane_calc
        .map{it -> [it[2]]}
        .flatten()
        .unique()
        .count()

How can I use this as the 'size' part of a groupTuple? I tried:

aligned_bams
        .groupTuple(by:0,size:lane_calc)

But it did not like it - complained about the value type etc. All thoughts gladly received!

George Seed
@georgeseed_twitter
Re. my post last night, the solution posted here nextflow-io/nextflow#1702 does work - for some reason - so that's nice.
Vlad Kiselev
@wikiselev
I have a channel with tuples consisting of some files and a value. I would like to run a process on this channel. Specifically, I'd like to repeat the process for each value (which is part of the input tuple) and I would also like to .collect() the files from each tuple at the same time. Anyone has done this before? I've already read about each and combine, so my input channel is of the right format. The problem I have is .collect() - not sure how to incorporate it in the input tuple.
3 replies
Wenhui Wu
@wwu7_gitlab
@robsyme Hi Robert, may I ask how you set the NCBI_API_KEY environment variable. I set it in my .bash_profile, but it doesn't seem to work for nextflow. Thanks!
Michael Bale
@michaelbale
Is there a way to have nextflow set the publishDir to an s3 bucket from not aws batch?
4 replies
Toni Hermoso Pulido
@toniher
Good morning! Is it possible to generate a DAG dot file without the need of running a whole pipeline?
10 replies
Szymon Szyszkowski
@PROJECT-DEFIANT
Hello Everyone, I have a nf pipeline running on local server, pipeline logs the ABORTED signal when logging with the nextflow log, but I can not find out the reason to this failure message, could You provide me some explanation to it ?
Matthew
@SamThePsychoticLeprechaun

Hi, in DSL2, do functions allow the passing in of channels? I was hoping to clean up a workflow where a similar sequence of operations is applied to channels, but it looks like channels passed into functions are demoted to plain Java collections.

The error reads:

No signature of method: java.util.LinkedHashMap.collectFile() is applicable for argument types: (LinkedHashMap, Script_77e02758$_collect_file_tuples_closure1) values: [[storeDir:null, sort:hash], Script_77e02758$_collect_file_tuples_closure1@54e2fe]
1 reply
UchicagoZchen138
@UchicagoZchen138
hey guys newbie here who's trying to get a quick proof of concept for using nextflow. I've some questions and hope people could give me some guidance (^-^)
1) After a run, does nextflow do any sort of automatic cleanup to its persistent volume claim if we're using kubernetes.
Chris Wright
@chrisnrg_twitter
I'm having trouble implementing a particular data flow pattern in Nextflow, essentially tasks being block by non-dependent calculations when a channel requires grouping on a key. A full example is in the gist below:
https://gist.github.com/cjw85/d334352e49ddd2e8bf2bd8e3891f3fe5
Central to this is not knowing how to use the items in one channel to allow the non-blocking grouping of another channel.
1 reply
Arijit Panda
@arpanda
I am trying to run a SGE job using current environment setting. In normal SGE code, -V parameter is used. can someone suggest what is the nextflow option to enable that option. Thanks
1 reply
gianmore
@gianmore
Hi all. I'm making a config file to specify error strategies for different processes in my pipeline. For one process I would like to increase number of cpus allocated in case of error 140 and 143 and to just retries for 10 times for all the other kind of errors. I tried the two following sintaxes but both didn't work
process {
    withName: structural_alignment {
        if (task.exitStatus in 140..143 )
            """
            errorStrategy = 'retry'
            cpus   = {2 * task.attempt}
                maxRetries = 5
            """
        else 
            errorStrategy = 'retry'
                maxRetries = 10
    }       
}
process {
 withName: structural_alignment {
        if (task.exitStatus in 140..143 ) {
            errorStrategy = 'retry'
            cpus   = {2 * task.attempt}
                maxRetries = 5
        }
        else {
            errorStrategy = 'retry'
                maxRetries = 10
        }
    }
gianmore
@gianmore
Also how can I specify only error 140 and 143 and not all the other in the middle? I tried to separate them with a comma, but it didn't work
gpalou4
@gpalou4

Hi guys.
I have a simple doubt (I think). I just need to save my output process in an external directory so I'm using the publishDir function for it. However I need 2 things: i) use variables (collected from input tuple); ii) Create the directory because it doesn't exist.
I successfully run it like this:

    publishDir "${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification", mode: 'move'

    input:
    file STAR_bam_file from STAR_alignment_bam
    set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3

But It doesn't create the directory. So I tried switching to this:

    outDir = file("${params.out_dir}/${tcga_project}/${tcga_barcode}/RSEM_quantification")
    outDir.mkdirs()
    publishDir outDir, mode: 'move'

    input:
    file STAR_bam_file from STAR_alignment_bam
    set val(sample_UUID), val(tcga_barcode), val(tcga_project) from samples_ch_3

However it states the error of:
No such variable: tcga_project

Any help in this situtation? Thanks :)

2 replies
hukai916
@hukai916
Hi all,
If exit 1, "ERROR MSG" is from a sub-workflow, the message will not be printed out to console, though it is inside .nextflow.log. I must use NXF_ANSI_LOG=false nextflow run main.nf to force the exit message to console.
Can anyone let me know why and how to solve it?
Regards,
nickhsmith
@nickhsmith
Hi all! I am trying to find how to set a profile in a config file. For example, I'm using a nf-core package, and am sick of running -profile docker at the command line level, and instead I would like to have in my nextflow.config file profile = docker such that it successfully imports all of the config options specified by the docker profile. thanks
3 replies
Brandon Cazander
@brandoncazander
I'm using collectFile in my workflow to combine the results of several upstream processes, and setting storeDir. However, when I resume my workflow (using AWS Batch executor), even though the upstream processes are correctly cached, the downstream process that consumes the output of collectFile always re-executes. I've tried sorting the input to collectFile with no luck.
4 replies
9d0cd7d2
@9d0cd7d2:matrix.org
[m]
Do you think if it's possible to decouple the executor destination from the user pipeline? My purpose is to have "one architecture pipeline" deciding the executor based on different metrics for each step of another pipeline ("user workflow"). I'm thinking on generate dynamically the user pipeline file, but I don't know if is possible to achieve this as a built-in functionality on the tool. Thanks in advance.
3 replies
xmzhuo
@xmzhuo
How to use docker from a private repo (user and password needed), can not find this in Scope docker:
 docker {
    enabled = true
}
process {
   withName:toy {
        container = 'user.abc.io/test/example'
   }
}
7 replies
Young
@erinyoung

How come

Channel
  .fromFilePairs( "${params.reads}/*{1,2}*.fastq.gz", size: 2 )
  .into { paired_reads }

does not work when the file starts with a number (like 7126-MS-1_100_S1_L005_R1_001.fastq.gz and 7126-MS-1_100_S1_L005_R2_001.fastq.gz or 21VR067049-211104_S1_L001_R1_001.fastq.gz and
21VR067049-211104_S1_L001_R2_001.fastq.gz

8 replies
Moritz E. Beber
@Midnighter
What is the recommended way to write a CSV file from nextflow? I can't seem to import the CSVPrinter https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVPrinter.html
David Mas-Ponte
@davidmasp
Hello all, I am using .collate() in my pipeline and then an implementation of this which works elegantly. My problem is with the -resume option. Some processes get cached, but the rest are re-scheduled. I am assuming the order changes from run to run before the collate operator, and thus assumes input has changed. Could this be it?
Also, what is the “canonical” way of sorting a channel? Maybe this?
Channel.from(2,1,3,4).toSortedList().fromList()