nextflow.enable.dsl=2 process runFacets { publishDir 'results' container 'repo/image:v1' input: path hapmapvcf path normalbam path tumorbam script: samplename = tumorbam.toString() """ snp-pileup-wrapper.R \ --snp-pileup-path /usr/bin/snp-pileup \ --vcf-file $hapmapvcf \ --normal-bam $normalbam \ --tumor-bam $tumorbam \ --output-prefix $samplename run-facets-wrapper.R \ --counts-file $samplename.snp_pileup.gz \ --sample-id $samplename \ --purity-cval 1000 \ --cval 500 \ --everything \ -fl /usr/local/lib/R/site-library/ \ -D /host/data/facets/ """ }
$samplename.snp_pileup
as ${samplename.snp_pileup}
. You want ${samplename}.snp_pileup.gz
Error executing process > 'runFacets (1)'
Caused by:
Process `runFacets (1)` terminated with an error exit status (1)
Command executed:
snp-pileup-wrapper.R --snp-pileup-path /usr/bin/snp-pileup --vcf-file hapmap_3.3.hg38.vcf.gz --normal-bam blood.bam --tumor-bam biopsy.bam --output-prefix biopsy
run-facets-wrapper.R --counts-file biopsy.snp_pileup.gz --sample-id biopsy --purity-cval 1000 --cval 500 --everything -fl /usr/local/lib/R/site-library/ -D /host/data/facets/
Command exit status:
1
Command output:
Failed to read VCF file: No such file or directory
.toString()
method and then you would manipulate the string.process only_has_file {
input:
path x from ch
output:
tuple path("result.txt"), val(sample_id) into output_ch
shell:
sample_id = x.toString()
sample_id= sample_id - "string to subtract"
"""
echo "these are your results" > result.txt
"""
I have a process monitoring progress to help me resume more cleanly (I am deleting files and starting with a list so no way for nf to know that I no longer want to process those files). I have a problem setting the publishDir directive: the files that I am looking for stay buried in the working directory. This is my process:
process monitor {
executor 'local'
publishDir "monitor", pattern: "*.ok", mode:'move'
output:
path "*.ok" optional true into null_ch
exec:
SRA_ok.collectFile(name: "sra.ok", newLine: true)
FASTQ_ok.collectFile(name: "fastq.ok", newLine: true)
KRAKEN_ok.collectFile(name: "fastq.ok", newLine: true)
isMOLTYPE_ok.collectFile(name: "molType.ok", newLine: true)
}
What could I do to have the file moved to my expected directory?
```#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process splitSequences {
publishDir 'results/orig'
input:
path dotfa
output:
path 'seq_*'
"""
csplit $dotfa '%^>%' '/^>/' '{*}' -f seq_
"""
}
process revRecords {
publishDir 'results/rev'
input:
path x
output:
path 'rev_*'
script:
samplename = x.toString()
"""
cat $x | rev > rev_${samplename}
"""
}
workflow{
dotfa = Channel.fromPath('./*.fa')
splitSequences(dotfa) | revRecords
}
@Itachibal
Hi, I have a question regarding the nextflow channels. It is easy to retrieve the name of the file(sample_ID) from the "fromFilePairs" channel factory method. But I have many BAM files and I am not able to retrieve the name or sample_ID. I tried all the other channel methods but they just create a channel with the files. Is there any way to get the "sample_ID/name" or is it only for paired-end reads? Thanks in advance.
create a process that extract the samples using the read-groups
cat ${bamlist} | while read bam; do samtools view -H "\${bam}" | grep "^@RG" | tr "\t" "\\n" | grep ^SM: -m1 | cut -d ':' -f 2 | tr "\n" ," >> out.csv && echo \${bam} >> out.csv; done
Hi, is it possible to specify the name of .nextflow.log or make a copy of .nextflow.log when executing nextflow run
(by some options)?
We have difficulty in specifying problems when supporting end-users who often overwrite .nextflow.log (and .nextflow.log.1, ..., 9) after the rounds of troubleshooting. It might cause human errors if we ask them to cp .nextflow.log somewhere
whenever they execute nextflow run
.
tail -F ${launchDir}/.nextflow.log >> \$(whoami)_\$(randonchar).log
and then use workflow.onComplete
or workflow.onError
to instruct it to copy that new log file to a safe location before the process gets shot down. I found out about the -F option here
clusterOptions
, but I was wondering whether there is a way that is independent of the cluster-system (like memory
etc.). We use LSF where this would be bsub -J $clusterJobName
. We'd use the name to select cluster jobs by job-type, for statistics and monitoring.