Nextflow community chat moved to Slack! https://www.nextflow.io/blog/2022/nextflow-is-moving-to-slack.html
pditommaso on master
Improve logging Signed-off-by:… Refactor wave packing [ci fast]… (compare)
With DSL2 is there a way to make staging of files from S3 lazy? When I was working with vanilla nextflow I could get it to delay staging files until the process starts executing by taking a channel that contains S3 path strings and mapping it to file()
in the input block of the process.
process example {
input:
file('input.txt') from s3pathStrings.map{ file(it) }
....
}
with DSL2 it now looks like this:
data = s3pathStrings.map{ file(it) }
process example {
input:
file('input.txt')
....
}
workflow{
example(data)
}
In the first version (before DSL2) nextflow would schedule some concurrent tasks, and as those tasks execute it would trigger the staging of the remote files they needed for execution. In the DSL2 version all the files are staged in the top level scope, before any "example" tasks start executing.
Since from
isn't part of DSL2 it doesn't seem that it is possible to use this trick any more. Is there another way to do this with DSL2? It was nice to have this behavior because if there are a lot of files to stage at the same time it sometimes causes s3 connections to time out. It was also a little nicer than limiting the parallel transfers because it helps prioritize which files to stage first to enable the first tasks to start executing more quickly.
Hi all, new to next flow here. Two related questions:
I want the following in my config file:
process {
publishDir {
path = '.'
mode = 'link'
enabled = { task.ext.publish }
}
}
So that I can turn on publication in my processes by just setting task.ext.publish
to true or false, depending. (This way I don't have to respecify the path or link each time)
However, there are two issues: in the config file, it seems that task.ext.publish
is interpreted as process.publishDir.task.ext.publish
which isn't what I want. Also, I can't seem to set ext.publish
from my process, the period in ext.publish true
seems to be screwing things up.
Any advice? Thanks!
Hi! When I try to run nf-core on my local PC after a long time, I get the following error. Does anyone know the cause?
nextflow run nf-core/rnaseq -profile test,docker -r 3.3
N E X T F L O W ~ version 21.09.1-edge
Launching `nf-core/rnaseq` [disturbed_meucci] - revision: 8094c42add [3.3]
ERROR: Validation of pipeline parameters failed!
* --hostnames: expected type: String, found: JSONObject ({"cfc":[".hpc.uni-tuebingen.de"],"utd_sysbio":["sysbio.utdallas.edu"],"utd_ganymede":["ganymede.utdallas.edu"],"genouest":[".genouest.org"],"cbe":[".cbe.vbc.ac.at"],"genotoul":[".genologin1.toulouse.inra.fr",".genologin2.toulouse.inra.fr"],"crick":[".thecrick.org"],"uppmax":[".uppmax.uu.se"],"icr_davros":[".davros.compute.estate"],"imperial":[".hpc.ic.ac.uk"],"binac":[".binac.uni-tuebingen.de"],"imperial_mb":[".hpc.ic.ac.uk"]})
curl -s https://get.nextflow.io | bash
Then I moved the created nextflow
file to the path. Now it works properly.
Is it possible for me to access the params
such that I can write them as an output? In my particular workflow, several of the params used are things like "password needed to unzip file", which I'd like to save for posterity in the output location. I'm basically trying to do:
params_json = new JsonBuilder(params).toString()
process xxx {
shell:
'''echo !{params_json} > params.json'''
}
But it's not letting me. Is there some other nice way for me to do this?
+([0-9])
. How do you all deal with this?Channel.fromFilePairs(["*_S[0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
"*_S[0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
"*_S[0-9][0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz",
"*_S[0-9][0-9][0-9][0-9]_L00[1-9]_{R1,R2}_001.fastq.gz"]).set { illumina_q }
Hey all. I have trouble cat two files together under shell. Outside of nextflow run, everything work flawlessly.
shell:
'''
echo "read1: NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz"
echo "read2: NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz"
echo $(ls)
cat NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz > read1.fastq.gz
cat NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz > read2.fastq.gz
'''
Command exit status:
1
Command output:
read1: NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz
read2: NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz
NA12878_S1_L001_R1_001.fastq.gz NA12878_S1_L001_R2_001.fastq.gz NA12878_S1_L002_R1_001.fastq.gz NA12878_S1_L002_R2_001.fastq.gz
Command error:
cat: NA12878_S1_L001_R1_001.fastq.gz: No such file or directory
cat: NA12878_S1_L002_R1_001.fastq.gz: No such file or directory
Hi all, How much does it cost to run a workflow using tower and aws-batch + fsx lustre? I'm working on a customer project right now and want to evaluate this option. Perhaps some guidance to the documentation and pricing model is needed.
Using aws-batch mode with s3 only work directory is not very efficient.
What is the commercial license price
*.fastq.gz
and I can read in the sample sheet, but I'd need to match sample sheet names and filename. Is there some way to use something like the.fromFilePairs
factory on an existing channel?
nextflow.enable.dsl=2
process createFiles {
output:
path("*.txt", emit: apath)
script:
"""
#!/usr/bin/env python
filenames = ['a.txt', 'b.txt', 'c.txt']
for f in filenames:
with open(f, "w") as wf:
wf.write("hello\\n")
print(f)
"""
}
process printContent {
input:
path(x)
script:
"""
cat $x
"""
}
workflow {
createFiles()
printContent(createFiles.out.apath)
}
executor > local (2)
[67/0c496d] process > createFiles [100%] 1 of 1 ✔
[1b/36a13c] process > printContent [100%] 1 of 1 ✔
workflow {
createFiles()
printContent(createFiles.out.apath.flatten())
}
Is it possible to create files with the native execution mode of a process? For example, I attempted the following:
process WRITE_FASTP_METRICS{
input:
val (rna_result)
val (adt_result)
output:
path "fastp_metrics.csv"
exec:
write_out = file("fastp_metrics.csv")
rna_result.forEach{key, value ->
write_out << key << ',' << value << '\n'
}
adt_result.forEach{key, value ->
write_out << key << ',' << value << '\n'
}
}
But the fastp_metrics.csv
is not created in the work directory, causing this error: Missing output file(s) ``fastp_metrics.csv`` expected by process ``WRITE_FASTP_METRICS (1)``
i'm wondering if we can pass in a container as a variable, as i want to test the same process over various versions of a software. something like this:
process A {
container= container_label
input:
tuple val(container_label), path(inputFile)
...
}
this code did not work, however. can it be done in another way?
process A {
container= params.container_label
input:
path(inputFile)
...
}
process A {
input:
tuple val(container_label), path(inputFile)
...
script:
task.container = container_label
...
}
process generate_readset {
tag "$sample_id"
cpus 48
input:
tuple val(read_name), val(chromosome1), val(chromosome2), val(cuteSV_pos1), val(cuteSV_pos2),
val(sniffle_pos1), val(sniffle_pos2),
path(cuteSV_vcf), path(sniffles_vcf) from vcf_input
output:
path 'complete_read_set.txt' into receiver
script:
"""
${bcftools_1_11} view --threads ${task.cpus} $cuteSV_vcf -r chr$chromosome1:$cuteSV_pos1-$cuteSV_pos2 > complete.txt
"""}