Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 14 17:06
    Midnighter commented #2660
  • Aug 14 17:05
    Midnighter commented #2660
  • Aug 14 14:39
    kojix2 commented #2217
  • Aug 13 15:35

    pditommaso on master

    Improve logging Signed-off-by:… Refactor wave packing [ci fast]… (compare)

  • Aug 13 13:02
    sahilhulage commented #3119
  • Aug 13 13:01
    sahilhulage edited #3119
  • Aug 13 13:01
    sahilhulage opened #3119
  • Aug 12 20:20
    bentsherman labeled #468
  • Aug 12 20:20
    bentsherman labeled #984
  • Aug 12 17:48
    bentsherman labeled #2983
  • Aug 12 16:26
    bentsherman labeled #3118
  • Aug 12 16:25
    bentsherman labeled #3117
  • Aug 12 12:31
    bentsherman commented #3106
  • Aug 12 12:13
    erbon7 synchronize #3106
  • Aug 12 11:51
    erbon7 synchronize #3106
  • Aug 12 11:43
    pditommaso commented #3117
  • Aug 12 11:42
    pditommaso commented #3117
  • Aug 12 11:32
    hnawar edited #3118
  • Aug 12 11:31
    hnawar opened #3118
  • Aug 12 10:00
    pditommaso labeled #3117
Paul Cantalupo
@pcantalupo
where does the nextflow log value called REVISION ID come from? How is it calculated?
shenkers
@shenkers

With DSL2 is there a way to make staging of files from S3 lazy? When I was working with vanilla nextflow I could get it to delay staging files until the process starts executing by taking a channel that contains S3 path strings and mapping it to file() in the input block of the process.

process example {
input:
  file('input.txt') from s3pathStrings.map{ file(it) }
....
}

with DSL2 it now looks like this:

data = s3pathStrings.map{ file(it) }

process example {
input:
  file('input.txt')
....
}

workflow{
  example(data)
}

In the first version (before DSL2) nextflow would schedule some concurrent tasks, and as those tasks execute it would trigger the staging of the remote files they needed for execution. In the DSL2 version all the files are staged in the top level scope, before any "example" tasks start executing.

Since from isn't part of DSL2 it doesn't seem that it is possible to use this trick any more. Is there another way to do this with DSL2? It was nice to have this behavior because if there are a lot of files to stage at the same time it sometimes causes s3 connections to time out. It was also a little nicer than limiting the parallel transfers because it helps prioritize which files to stage first to enable the first tasks to start executing more quickly.

LiterallyUniqueLogin
@LiterallyUniqueLogin

Hi all, new to next flow here. Two related questions:

I want the following in my config file:

process {
  publishDir {
    path = '.'
    mode = 'link'
    enabled = { task.ext.publish }
  }
}

So that I can turn on publication in my processes by just setting task.ext.publish to true or false, depending. (This way I don't have to respecify the path or link each time)

However, there are two issues: in the config file, it seems that task.ext.publish is interpreted as process.publishDir.task.ext.publish which isn't what I want. Also, I can't seem to set ext.publish from my process, the period in ext.publish true seems to be screwing things up.

Any advice? Thanks!

kojix2
@kojix2

Hi! When I try to run nf-core on my local PC after a long time, I get the following error. Does anyone know the cause?

nextflow run nf-core/rnaseq -profile test,docker -r 3.3

N E X T F L O W  ~  version 21.09.1-edge
Launching `nf-core/rnaseq` [disturbed_meucci] - revision: 8094c42add [3.3]

ERROR: Validation of pipeline parameters failed!


* --hostnames: expected type: String, found: JSONObject ({"cfc":[".hpc.uni-tuebingen.de"],"utd_sysbio":["sysbio.utdallas.edu"],"utd_ganymede":["ganymede.utdallas.edu"],"genouest":[".genouest.org"],"cbe":[".cbe.vbc.ac.at"],"genotoul":[".genologin1.toulouse.inra.fr",".genologin2.toulouse.inra.fr"],"crick":[".thecrick.org"],"uppmax":[".uppmax.uu.se"],"icr_davros":[".davros.compute.estate"],"imperial":[".hpc.ic.ac.uk"],"binac":[".binac.uni-tuebingen.de"],"imperial_mb":[".hpc.ic.ac.uk"]})
kojix2
@kojix2
By reinstalling Nextflow, the error no longer occurs. It may have something to do with the fact that I recently updated Ubuntu to 21.10.
I could not figure out how to uninstall Nextflow. So I simply ran the following command one more time. curl -s https://get.nextflow.io | bash Then I moved the created nextflow file to the path. Now it works properly.
maxulysse
@maxulysse:matrix.org
[m]
@kojix2 can you come over to the nf-core slack. I think I remember some discussion about this issue
maxulysse
@maxulysse:matrix.org
[m]
Nevermind, found the related issue: nf-core/tools#1304
1 reply
Jeffrey Massung
@massung

Is it possible for me to access the params such that I can write them as an output? In my particular workflow, several of the params used are things like "password needed to unzip file", which I'd like to save for posterity in the output location. I'm basically trying to do:

params_json = new JsonBuilder(params).toString()

process xxx {
    shell:
        '''echo !{params_json} > params.json'''
}

But it's not letting me. Is there some other nice way for me to do this?

9 replies
Juan MONROY-NIETO
@jmonroynieto
I've been using this as boilerplate code to find reads in the current directory. I would like to make it less verbose but that would need extended globbing and my initial tests have been unsuccessful even with all possible scaping sequences trying +([0-9]). How do you all deal with this?
Channel.fromFilePairs(["*_S[0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
                        "*_S[0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
                        "*_S[0-9][0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
                        "*_S[0-9][0-9][0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz"]).set { illumina_q }
Steven P. Vensko II
@spvensko_gitlab
Does nextflow clean -before <desired_run> delete the CACHED directories it utilized (if they are from an earlier run prior to <desired_run>?
xmzhuo
@xmzhuo

Hey all. I have trouble cat two files together under shell. Outside of nextflow run, everything work flawlessly.
shell:
'''
echo "read1: NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz"
echo "read2: NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz"
echo $(ls)
cat NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz > read1.fastq.gz
cat NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz > read2.fastq.gz
'''

Command exit status:
1

Command output:
read1: NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz
read2: NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz
NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz

Command error:
cat: NA12878_S1_L001_R1_001.fastq.gz: No such file or directory
cat: NA12878_S1_L002_R1_001.fastq.gz: No such file or directory

13 replies
hukai916
@hukai916
Hello all,
Does anyone know if it is possible to mix more than one language in the script block? E.g. run a specific bash command first, then run the rest with Python? Thanks!
2 replies
Pavel Borobov
@blvp

Hi all, How much does it cost to run a workflow using tower and aws-batch + fsx lustre? I'm working on a customer project right now and want to evaluate this option. Perhaps some guidance to the documentation and pricing model is needed.
Using aws-batch mode with s3 only work directory is not very efficient.

What is the commercial license price

Steffen Fehrmann
@sfehrmann
:point_up: August 11, 2019 12:50 PM
Hi all, I'm looking for a method to convert paired end data from bcl2fastq process output into fastq file pair tuple. @happykhan's method is the only I found. Is there some more elegant way with DSL2? Currently I have a standard bcl2fastq process that emits *.fastq.gzand I can read in the sample sheet, but I'd need to match sample sheet names and filename. Is there some way to use something like the.fromFilePairs factory on an existing channel?
Ernesto Lowy
@elowy01
Hi, I've a question on a process using a Python block and emitting files in DSL2.
This is the workflow:
nextflow.enable.dsl=2
process createFiles {
    output:
    path("*.txt", emit: apath)

    script:
    """
    #!/usr/bin/env python

    filenames = ['a.txt', 'b.txt', 'c.txt']

    for f in filenames:
        with open(f, "w") as wf:
            wf.write("hello\\n")
        print(f)
    """
}
process printContent {
    input:
        path(x)

    script:
    """
    cat $x
    """
}

workflow {
    createFiles()
    printContent(createFiles.out.apath)
}
The first process (createFiles) creates the files and emit a channel with the paths of the created files using a Python block and the second (printContent) prints the contents of each file.
My question is:
When I run this workflow. I see the following information from NF:
executor >  local (2)
[67/0c496d] process > createFiles  [100%] 1 of 1 ✔
[1b/36a13c] process > printContent [100%] 1 of 1
Ernesto Lowy
@elowy01
And it seems that the three files generated by createFiles are emitted in a single channel and all files are taken together by printContent instead of being analysed in an independent printContent job each.
7 replies
Can you let me know how to analyse independently each of the files?
Thanks
Luca Cozzuto
@lucacozzuto
Hi @elowy01 :) you can either make the file names as a channel outside your block of code as suggested by @pcantalupo or use the flatten operator
Paul Cantalupo
@pcantalupo
@lucacozzuto where do you add flatten? Can you post the code? thank you
Luca Cozzuto
@lucacozzuto
here
workflow {
    createFiles()
    printContent(createFiles.out.apath.flatten())
}
Paul Cantalupo
@pcantalupo
ahh, I tried that but forgot the (). Thank you
4 replies
Luca Cozzuto
@lucacozzuto
Hi all, do you know how to change the behavior when uploading the bin to AWS ?
I found that soft links are not preserved and lost
I opened an issue but not sure though nextflow-io/nextflow#2427
Luca Cozzuto
@lucacozzuto
Well it looks like links are not possible in S3. Mmm
Luca Cozzuto
@lucacozzuto
Well another question. Is there any variable that nextflow sets when uploading the bin folder to S3? For accessing that folder from any process...
cc @pditommaso :)
John Ma
@JohnMCMa

Is it possible to create files with the native execution mode of a process? For example, I attempted the following:

process WRITE_FASTP_METRICS{
    input:
        val (rna_result)
        val (adt_result)
    output:
        path "fastp_metrics.csv"
    exec:
        write_out = file("fastp_metrics.csv")
        rna_result.forEach{key, value ->
            write_out << key << ',' << value << '\n'
        }
        adt_result.forEach{key, value ->
            write_out << key << ',' << value << '\n'
        }
}

But the fastp_metrics.csv is not created in the work directory, causing this error: Missing output file(s) ``fastp_metrics.csv`` expected by process ``WRITE_FASTP_METRICS (1)``

anoronh4
@anoronh4

i'm wondering if we can pass in a container as a variable, as i want to test the same process over various versions of a software. something like this:

process A {
container= container_label

input:
tuple val(container_label), path(inputFile)
...
}

this code did not work, however. can it be done in another way?

rpetit3
@rpetit3:matrix.org
[m]
You could make a parameter to do it at run time, something like
process A {
container= params.container_label

input:
path(inputFile)
...
}
anoronh4
@anoronh4
isn't that still the same issue? params.container_label is just one value, i still want the input channels to affect the container directive
1 reply
i think i got it:
process A {
input:
tuple val(container_label), path(inputFile)
...
script:
task.container = container_label
...
}
rpetit3
@rpetit3:matrix.org
[m]
let us know if that works!
anoronh4
@anoronh4
@rpetit3:matrix.org it does!
rpetit3
@rpetit3:matrix.org
[m]
nice to know! thanks for sharing
Luca Cozzuto
@lucacozzuto
Mmm why not passing it as a parameter? I'm passing a number of things in my workflows:
emily-kawabata
@emily-kawabata
Hi everyone,
Does anyone know if there will be any nextflow workshop in the near future? I see that there was one in July of 2020 and another one in May of this year hosted by ecseq and was wondering if anyone knows if a similar event will be taking place in the future.
2 replies
9d0cd7d2
@9d0cd7d2:matrix.org
[m]
Hi all! I'm very interested on the tool as seems that covers a lot of the integrations that we need to a particular project (Slurm, buckets, Singularity, etc), but my worries are that our project is related mostly on CFD workflows and a small part on AI, and aparently Nextflow seems quite relatated with bio and genomics workflows. Dou you think that we can use it anyway?
5 replies
xmzhuo
@xmzhuo
Hey All,
For azurebatch, is it possible to define two pools type (with autoScale for different vmType) in autoPoolMode?
Ghost
@ghost~61847cca6da037398489d4e6
In watchPath, is it ok to use wildcard in directory and in file name at the same time? For example: watchPath('/myfolder/*/logs/*.log', 'create'). It doesn't seem to work for me.
zhemingfan
@zhemingfan
Hi everyone, I'm relatively new to Nextflow. For the following code, I'm getting an error where I'm unable to retrieve the index file ([E::idx_find_and_load] Could not retrieve index file for 'merged_sorted.vcf.gz') even though the folder points to the correct path, and running this command normally outside of Nextflow works fine. Would anyone happen how to fix this?
    process generate_readset {
tag "$sample_id" 
cpus 48
input:
    tuple val(read_name), val(chromosome1), val(chromosome2), val(cuteSV_pos1), val(cuteSV_pos2), 
    val(sniffle_pos1), val(sniffle_pos2), 
    path(cuteSV_vcf), path(sniffles_vcf) from vcf_input
output:
    path 'complete_read_set.txt' into receiver
script:
"""
${bcftools_1_11} view --threads ${task.cpus} $cuteSV_vcf -r chr$chromosome1:$cuteSV_pos1-$cuteSV_pos2 > complete.txt
"""}