nextflow clean -before
failed with ERROR ~ Unexpected error [EOFException]
. Is it a known issue?
nextflow -log log clean -before
and open an issue with the produced log
file
One thing that I think is missing is a default setting that identifies which process created which directory in the work directory
pair_id
instead of the file name?
output:
set pair_id, file("mergebam.fastqtosam.bwa.bam") into MergeBamAlignment_output
(pair_id, bam_file)
..
sam_and_bam_ch = BwaMem_output.phase(FastqToSam_output) { left, right -> tuple(left[0], left[1], right[1]) }
set pair_id, file(same), file(bam) from sam_and_bam_ch
sam_and_bam_ch = BwaMem_output.phase(FastqToSam_output).map { left, right -> tuple(left[0], left[1], right[1]) }
Hi!
I succesfully started a cluster in aws!! Thanks for your help!
I tried to start a test pipeline ("SciLifeLab/NGI-RNAseq"), and the first time I executed it was going fine with this CMD:
~/nextflow run SciLifeLab/NGI-RNAseq -profile docker -w $s3workDir --reads s3readsPath --outdir s3OutDir --fasta s3genomeFa --gtf s3GTFfile
## where s3readsPath is a s3 Dir containing several samples
N E X T F L O W ~ version 0.25.3-SNAPSHOT
Launching `SciLifeLab/NGI-RNAseq` [fabulous_panini] - revision: 395121e2c5 [master]
WARN: Access to undefined parameter `help` -- Initialise it to a default value eg. `params.help = some_value`
=========================================
NGI-RNAseq : RNA-Seq Best Practice v1.2
=========================================
[...]
[warm up] executor > ignite
Fetching EC2 prices (it can take a few seconds depending your internet connection) ..
[32/92bc08] Submitted process > fastqc (...)
[b3/43e8a4] Submitted process > trim_galore (...)
[a2/feee1f] Submitted process > makeSTARindex (...)
However, I thought I was using many samples to test (although none of them are more than 700Kb), so I CTRL + C to stop the execution and re run with only a pair of samples:
~/nextflow run SciLifeLab/NGI-RNAseq -profile docker -w $s3workDir --reads s3ModPath --outdir s3OutDir --fasta s3genomeFa --gtf s3GTFfile
But nextflow throw an error:
N E X T F L O W ~ version 0.25.3-SNAPSHOT
Launching SciLifeLab/NGI-RNAseq
[hopeful_joliot] - revision: 395121e2c5 [master]
ERROR ~ Unexpected error [UnsupportedOperationException]
The path to the pair of samples is the same as the first CMD except for I specifically look for one sample name within the s3 directory: sample*{1,2}.fastq.gz
I'm not sure if the error comes with for the pipeline or because of nextflow itself, where I have to look for this?
Many thanks for your time,
1:1+ 2:1+ 3:1+ 4:1+
5:1+ 6:1+ 7:1+ 8:1+
9:1+ 10:1+ 11:1+ 12:1+
13:1+ 14:1+ 15:1+ 16:1+
17:1+ 18:1+ 19:1+ 20:1+
21:1+ 22:1+ X:1+ Y:1+ MT:1+ GL000207.1:1+ G .. etc
-L 1:1+ -L 2:1+ -L 3:1+ -L 4:1+
splitText
is enough, you don't need to pass as a file
Channel.fromPath(params.contigs)
.splitText()
.set { intervals }
val intervals
intervals
variable hols the tab separated line, thus you will need to split it and prefix the values with -L
def intervals = intervals.tokenize('\t').collect{ "-L $it" }.join(" ")
file
when you need it :)
splitText
use a each
repeater eg
intervals
input as shown below:
input:
each intervals from file(params.contigs).readLines()
..
groups.list
file
input:
each intervals from file(params.contigs).readLines().findAll{ it }
Hi Paolo,
I hate to bother you again, but I need more insight into nextflow
:
1) -resume
without storeDir
seems to overdo it: -resume
restarted successfully completed jobs that failed previously (retried
jobs), and then it restarted everyhting downstream as well, even though some of the downstream jobs were complete. Is it expected? How can I avoid it? Is manually deleting some work
folders OK, or do one have edit the log
file?
2) Does nextflow
update the work/hash
of a failed job, after that job is successfully retried
? Why?
3) Seems like nextflow
dismisses some of successfully completed jobs as terminated/aborted ones ... Is there a way to make nextflow
more resilient on our LSF-cluster, maybe adjusting pollInterval
, dumpInterval
, queueStatInterval
, exitReadTimeout
?
2 more questions unrelated to cluster execution:
4) Can one access the configuration profile
flag from inside the pipeline script? (to make some processes execute only on a cluster for example) How?
5) Is it possible to use 2 publishDir
statements in per process, to make some files go to one folder, and other into a different one?:
publishDir path: getIntermediateDir('pairsam_run'), pattern: "*.pairsam.gz"
publishDir path: getOutDir('stats_run'), pattern: "*.stats", mode:"copy"
workflow.profile
, see https://www.nextflow.io/docs/latest/metadata.html
nextflow
trace shows a process as a FAILED one, but work/hash
is full with all the results
43 ad/4613a9 4681790 map_runs (library:HeLa1 run:lane4 chunk:11) FAILED - 2017-07-25 19:20:50.513 59m 37s 17m 25s 596.0% 9.4 GB 9.9 GB 7.7 GB 2.8 GB
work/ad/4613a9.../
.exitcode
is 0
map_runs (library:HeLa1 run:lane4 chunk:11)
was re-submitted
resume
nextflow
restarted from map_runs (library:HeLa1 run:lane4 chunk:11)
84 85/ffb814 4681937 map_runs (library:HeLa1 run:lane4 chunk:11) COMPLETED 0 2017-07-25 20:20:27.630 1h 2m 55s 57m 19s 644.8% 10.1 GB 10.6 GB 18.8 GB 15.1 GB
distiller
on their own
ad/4613a9
failed, than 85/ffb814
- completed
ad/4613a9
is full with results
.exitcode=0
.nextflow.log.1
.nextflow.log
for the stuff that's running now, after resume
NXF_VER=0.25.3-SNAPSHOT nextflow run ..etc
NXF_VER=0.25.3-SNAPSHOT
make nextflow
update itself?