Nextflow community chat moved to Slack! https://www.nextflow.io/blog/2022/nextflow-is-moving-to-slack.html
pditommaso on master
Add Fusion integration test Si… (compare)
jordeu on wave-debug-remote
Add support for debuging a Fusi… (compare)
pditommaso on master
Add Wave tests Signed-off-by: … (compare)
pditommaso on master
Improve K8s securityContext sup… (compare)
pditommaso on fusion-grid
Minor change [ci skip] Signed-… (compare)
pditommaso on fusion-grid
Minor change [ci fast] Signed-… (compare)
pditommaso on master
Bump version 23.02.0-SNAPSHOT … (compare)
pditommaso on fusion-grid
Add Fusion support to Grid Engi… (compare)
if
block in the directives section of a process, and if it's false I want to stop running and fail.
I have a workflow that involves splitting up files per chromosome and then merging them later. To make the workflow a bit more flexible, I first pull the chromosomes out of the reference file using a bit of grep, so I have a channel with all the chromosome names. I can then do something like this:
input:
tuple val(id), path(file) from channel_a
each chr from chromosomes
output:
tuple val(id), val(chr), path(outfile) into channel_b
then I can group things up:
channel_b.groupTuple(by: 0)
and use that as input for the next process.
My question is, since the number of chromosomes is constant for any given run of the workflow, can I extract that value (e.g. map{ it.readLines().size() }
) and feed that into groupTuple? I thought perhaps I could assign this value to a variable and then pass that variable to the groupTuple call but this doesn't work (the type of the variable is something fancy, not an Int).
Hi, this config worked for me :
singularity.enabled = true singularity.runOptions = "--bind /path:/path"
many thanks for your suggerence @tomraulet , finally Is working for me to!
can I generate a val ouput emit from a script inside a process?
I tried to emit as val
process test {
output:
val val_var , emit: val_var
shell:
"""
val_var=test
"""
}
Error:
Caused by:
Missing value declared as output parameter: val_var
I also tried to emit as env
process test{
output:
env val_var , emit: val_var
shell:
"""
val_var=test
"""
}
When I tried to use it in the downstream process by calling test.out.val_var
Caused by:
No such property: val_var for class: ScriptC3863517AF925202A24F63BCD0003707
hello, I'm just starting out in nextflow and seeking some strategic advice. What I'm trying to achieve is to run a workflow for a list of input samples. I can setup a workflow for a single sample, but how do I push multiple samples through it? (most examples in the docs show a single process) Here's where I'm stuck:
params.samples = ["samples/a","samples/b","samples/c"]
process step1 {
input:
file sample from samples_ch
output:
file 'result.bam' into step1_ch
...
}
process step2 {
input:
file bam from step1_ch
output:
file 'result.vcf' into step2_ch
...
}
This runs for sample a but not the rest, I'm suspecting because step2 only accepts one thing from step1_ch?
I can see two general strategies, either make a workflow for a single sample and then import that into a multi-sample wrapper, or enable each process to accept multiple inputs? Any advice would be greatly appreciated! Thanks
.def
file with singularity build. This proved to work, but I am unsure how to make this portable. What are best practices regarding images, and how should i do this to optimize functionality and user-friendlyness?
if I see that correctly, the error strategy "ignore" will lead to "workflow.success"=true at the end.
I would like if running many samples to indeed ignore one failing one, but then check at the very end if any of them failed.
Is there a trace/object which can be accessed in main.nf where one could verify that at the end ? Something like
Sample | success |
---|---|
1 | true |
1 | true |
1 | true |
1 | true |
1 | false |
1 | true |
1 | true |
This would allow to clean e.g. published data which would be otherwise orphan files
I am having an issue where an imported module has an implicit workflow.onComplete handler. When I run the main workflow, the imported workflow.onComplete handler is being triggered, I assume because wf2's "workflow" is in the wf1 namespace. Example code:
# wf1.nf
include { subworkflow } from "./wf2"
// wf1 implicit workflow
workflow {
main:
println('wf1 implicit workflow called')
}
// pulls wf2 implicit workflow.onComplete into this namespace and executes
----------------------------------------
#wf2.nf
//explicitly named workflow that is imported to wf1
workflow subworkflow {
main:
println('wf2 as subworkflow called')
}
// wf2 implicit workflow
workflow {
main:
println('wf2 implicit workflow called')
}
// wf2 implicit workflow.onComplete handler
workflow.onComplete {
log.info('wf2 implicit workflow completed')
}
Command and output is:
$ nextflow run wf1.nf
N E X T F L O W ~ version 21.10.2
Launching `wf1.nf` [awesome_euclid] - revision: e050c16fcb
wf1 implicit workflow called
wf2 implicit workflow completed
Is there a way to avoid this namespace clash while keeping the workflow.onComplete handler for wf2? Or do I need to pull out the subworkflow in example above to it's own separate file and have wf1 import directly from that?
output:
tuple val(sampleId), path("fastq_files/*_R{1,2}_001.fastq.gz"), emit: fastq
I have a situation where I need some dynamic input values for a process which I will need to fetch from a datastore as part of the pipeline. I was wondering what the best/accepted way of getting these values available to the process as variables is? My initial thought is to have the script that grabs the values from the datastore output a JSON file and then use a JSON reader in the process that requires them to access them?
Something like:
proc1 {
output:
path patient_data.json
script:
"""
python get_patient_data.py
"""
}
proc2 {
input:
path patient_data_file
path other_file
output:
path some_output.file
script:
patient_data = jsonSlurper.parse(patient_data_file)
"""
the_command --opt1 ${patient_data['val1']} --opt2 ${patient_data['val2']} other_file
"""
}
Is this a reasonable solution? (I am aware the actual code above won't work because I haven't properly created the jsonslurper)
Hi all, I am getting a java.nio.file.ProviderMismatchException
when I run the following script:
process a {
output:
file _biosample_id optional true into biosample_id
script:
"""
touch _biosample_id
"""
}
process b {
input:
file _biosample_id from biosample_id.ifEmpty{file("_biosample_id")}
script:
def biosample_id_option = _biosample_id.isEmpty() ? '' : "--biosample_id \$(cat _biosample_id)"
"""
echo \$(cat ${_biosample_id})
"""
}
i'm using a slightly modified version of Optional Input pattern.
Any ideas on why I'm getting the java.nio.file.ProviderMismatchException
?
workflow mywf {
take:
data_dir
main:
task1(data_dir)
task2(data_dir) // should wait for task1 to complete before starting
}
.collect()
ends up generating more than 9k symlinks files in the same folder, for each sample. Is there any way to collect them separately ?
b
3 times on the output of a
:nextflow.enable.dsl = 2
process a {
input:
val x
output:
val y
exec:
y = x.toUpperCase()
}
process b {
input:
val x
val n
output:
val y
exec:
y = "$x$n"
}
workflow {
x = channel.value('a')
n = channel.of(1..3)
// I know these lines would work.
//p = a(x)
//b(p, n) | collect | view
// Is there any way to do it all in one pipeline?
a(x) | b(???, n) | collect | view
}
I want to get a channel of the form:
['lib1', 'species1']
['lib1, 'species2']
['lib2', 'species3']
My process is parsing a kraken2 report text file to find any species present present above some threshold per lib:
process select_species {
input:
tuple val(library_id), path(kraken_report)
val(threshold)
output:
tuple val("${library_id}"), stdout , emit: species_list
script:
"""
awk '\$1>${threshold} && \$4=="S" { print \$NF }' ${kraken_report} | grep -v sapiens
"""
}
This gives me a tuple that contains newlines, but I feel like I'm only 1 magic nextflow command away from getting my desired output.
Current output:
[P01900, coli
ananatis
oryzae
acnes
barophilus
VB_PmiS-Isfahan
]
Desired output:
[P01900, coli]
[P01900, ananatis]
[P01900,barophilus]
...
I'm trying to write the contents of a channel, which is a tuple, to a file by converting to string. Ideally I want a csv file that has a row per tuple. Something like this:
output:
tuple patient, sample, "${outfile}.txt" into fileChannel
}
fileChannel.map { patient, sample, outfile -> "${patient},${sample},${outfile}\n"
}.collectFile(name: "myoutfile.csv", sort: true, storeDir: "mydir")
But I'm getting a No such variable: patient
error upon running the pipeline. I'm using the latest version of NextFlow and am wondering if the map syntax I'm using is outdated?
Caused by:
Oops.. something wrong happened while creating task 'trim' unique id -- Offending keys: [
- type=java.util.UUID value=f69e3acd-c192-4574-9c02-8921bdaf695a,
- type=java.lang.String value=trim,
- type=java.lang.String value=trimmomatic PE -phred33 -threads 6 ${READS[0]} ${READS[1]} ${sample_id}_1_trim.fastq.gz ${sample_id}_1_UP_trim.fastq.gz \
${sample_id}_2_trim.fastq.gz ${sample_id}_2_UP_trim.fastq.gz \
ILLUMINACLIP:"TruSeq3-PE-2.fa":2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:75 MINLEN:36
rm -rf ${sample_id}_1_UP_trim.fastq.gz ${sample_id}_2_UP_trim.fastq.gz
,
- type=java.lang.String value=/home/hieu/G4500_new/singularity_imgs/trimmomatic.sif,
- type=java.lang.String value=sample_id,
- type=java.lang.String value=12-GAAE61_S8556-S8756,
- type=java.lang.String value=READS,
- type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz, storePath:/samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz, stageName:12-GAAE61_S8556-S8756_R1.fastq.gz), FileHolder(sourceObj:/samples_new/12-GAAE61_S8556-S8756_R2.fastq.gz, storePath:/samples_new/12-GAAE61_S8556-S8756_R2.fastq.gz, stageName:12-GAAE61_S8556-S8756_R2.fastq.gz)],
- type=java.lang.String value=$,
- type=java.lang.Boolean value=true]
nextflow.exception.UnexpectedException: Oops.. something wrong happened while creating task 'trim' unique id -- Offending keys: [
- type=java.util.UUID value=f69e3acd-c192-4574-9c02-8921bdaf695a,
- type=java.lang.String value=trim,
- type=java.lang.String value=trimmomatic PE -phred33 -threads 6 ${READS[0]} ${READS[1]} ${sample_id}_1_trim.fastq.gz ${sample_id}_1_UP_trim.fastq.gz \
${sample_id}_2_trim.fastq.gz ${sample_id}_2_UP_trim.fastq.gz \
ILLUMINACLIP:"TruSeq3-PE-2.fa":2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:75 MINLEN:36
rm -rf ${sample_id}_1_UP_trim.fastq.gz ${sample_id}_2_UP_trim.fastq.gz
,
- type=java.lang.String value=/home/hieu/G4500_new/singularity_imgs/trimmomatic.sif,
- type=java.lang.String value=sample_id,
- type=java.lang.String value=12-GAAE61_S8556-S8756,
- type=java.lang.String value=READS,
- type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz, storePath:/samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz, stageName:12-GAAE61_S8556-S8756_R1.fastq.gz), FileHolder(sourceObj:/samples_new/12-GAAE61_S8556-S8756_R2.fastq.gz, storePath:/samples_new/12-GAAE61_S8556-S8756_R2.fastq.gz, stageName:12-GAAE61_S8556-S8756_R2.fastq.gz)],
- type=java.lang.String value=$,
- type=java.lang.Boolean value=true]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:72)
at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:59)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:286)
at nextflow.processor.TaskProcessor.computeHash(TaskProcessor.groovy:1988)
at nextflow.processor.TaskProcessor$computeHash$55.callCurrent(Unknown Source)
at nextflow.processor.TaskProcessor.createTaskHashKey(TaskProcessor.groovy:1975)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:193)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:61)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:591)
at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:59)
at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Unable to hash content: /samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz
at nextflow.util.CacheHelper.hashFileContent(CacheHelper.java:350)
at nextflow.util.CacheHelper.hashFile(CacheHelper.java:261)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:186)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:183)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:111)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:107)
at nextflow.util.CacheHelper.hashUnorderedCollection(CacheHelper.java:376)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:174)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:178)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:111)
at nextflow.util.CacheHelper.hasher(CacheHelper.java:107)
at nextflow.util.CacheHelper$hasher$12.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
at nextflow.processor.TaskProcessor.computeHash(TaskProcessor.groovy:1984)
... 18 common frames omitted
Caused by: java.nio.file.AccessDeniedException: /samples_new/12-GAAE61_S8556-S8756_R1.fastq.gz
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.n
Can someone tell me how to use the nextflow CsvSplitter
in Groovy code? I can't even seem to figure out the correct constructor.
import nextflow.splitter.CsvSplitter
CsvSplitter.options([file: 'foo.csv', sep: '\t'])
but I get a type mismatch which I don't know how to fix or even if this is the right way to do it in general.
groovy.lang.MissingMethodException: No signature of method: static nextflow.splitter.CsvSplitter.options() is applicable for argument types: (LinkedHashMap) values: [[file:foo.csv, sep: ]]
Possible solutions: options(java.util.Map), options(java.util.Map), options(java.util.Map), print(java.lang.Object), print(java.io.PrintWriter)
at ConsoleScript8.runScript(ConsoleScript8:6)
at nextflow.script.BaseScript.runDsl1(BaseScript.groovy:163)
at nextflow.script.BaseScript.run(BaseScript.groovy:200)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Hi y'all! MAybe someone can help me. I have two channels, one containing normal and one containing tumor samples. I need to create a third channel that contains all tumor samples that do not have a matching key in the normal sample. Sort of a group operation spitting out only ‘remainder’. My idea now was to convert the keys from normal to a list and then ‘filter’ the tumor samples based on the key not being in the list:
normallist = cram_variant_calling_normal_cross.map{patient, meta, cram, crai -> [patient]}.toList()
tumor_only = cram_variant_calling_tumor_cross.filter{ patient, meta, cram, crai ->
!(normallist.contains(patient)
}
This does not work. I have also tried to use collect()
, collect().toList()
, and .subscribe onNext: { normallist.add(it) }
I have googled around a bit and found the following. https://github.com/nextflow-io/nextflow/discussions/2547 https://github.com/nextflow-io/nextflow/discussions/2275 neither have helped so far solving my problem. Any hints? Or is there a much easier way to achieve the above?
profile
block refers specifically to doing it inside the same config file? nf-core
profiles regularly override process
settings whilst the pipelines themselves contain process
settings in non-profile sections yet there are seemingly no issues