These are chat archives for nextflow-io/nextflow

18th
Mar 2019
Pierre Lindenbaum
@lindenb
Mar 18 11:10

Hi all, can you explain while my step named callBcftools is not cached when I run my workflow twice while it was successful the 1st time ?

My nextflow file is not changed (see the sha1sum ) .

I use a custom executor 'ccrt' , could it be the cause of the problem ?

$ java -Dccc.project=fg0073 -jar /ccc/work/cont007/fg0019/lindenbp/packages/nextflow/nextflow.jar  run -resume -work-dir /ccc/scratch/cont007/fg0073/lindenbp/20190314.TROUCARD.NEXTFLOW/work 20190314.troucard.nfN E X T F L O W  ~  version 18.11.0-edge
Launching `20190314.troucard.nf` [desperate_elion] - revision: dbcdb5f132
[warm up] executor > local
[warm up] executor > ccrt
[6c/5d8759] Cached process > HengLiBed (snpeff data)
[10/c90a41] Cached process > makeBed (collect regions)
[e3/88ed36] Cached process > snpEffDataDir (snpeff data)
[99/bd8219] Cached process > makePedigree (make pedigree)
[8c/9118cc] Cached process > makeBamList (collect BAMs)
[23/d564ce] Submitted process > callBcftools (call chr1:10000-177417)
[e8/303a19] Submitted process > callBcftools (call chr1:227417-267719)
[4c/da10a4] Submitted process > allCalled (allCalled)
[53/643505] Submitted process > mergeChromosomes (merge chr1)
[61/4b7e11] Submitted process > annotChromosome (annot chr1)
[4d/1371b0] Submitted process > makeVcfListAnnotContigs (making list from 1 vcfs)
[6b/b164ac] Submitted process > mergeVcf (making list from vcf.list )

$ sha1sum 20190314.troucard.nf 
b165013948310756b053dea7205671eeba4069a8  20190314.troucard.nf

$ java -Dccc.project=fg0073 -jar /ccc/work/cont007/fg0019/lindenbp/packages/nextflow/nextflow.jar  run -resume -work-dir /ccc/scratch/cont007/fg0073/lindenbp/20190314.TROUCARD.NEXTFLOW/work 20190314.troucard.nf
N E X T F L O W  ~  version 18.11.0-edge
Launching `20190314.troucard.nf` [maniac_jones] - revision: dbcdb5f132
[warm up] executor > local
[warm up] executor > ccrt
[99/bd8219] Cached process > makePedigree (make pedigree)
[10/c90a41] Cached process > makeBed (collect regions)
[e3/88ed36] Cached process > snpEffDataDir (snpeff data)
[6c/5d8759] Cached process > HengLiBed (snpeff data)
[8c/9118cc] Cached process > makeBamList (collect BAMs)
[43/7c80f4] Submitted process > callBcftools (call chr1:10000-177417)
[04/f86324] Submitted process > callBcftools (call chr1:227417-267719)
[ea/e350b7] Submitted process > allCalled (allCalled)
[6e/42a2a0] Submitted process > mergeChromosomes (merge chr1)
[6c/bd4e4b] Submitted process > annotChromosome (annot chr1)
[c8/6b228d] Submitted process > makeVcfListAnnotContigs (making list from 1 vcfs)
[b9/875bc4] Submitted process > mergeVcf (making list from vcf.list )

$ sha1sum 20190314.troucard.nf 
b165013948310756b053dea7205671eeba4069a8  20190314.troucard.nf

Here is a snipped of the target callBcftools:

process callBcftools {
    executor 'ccrt'
    errorStrategy 'finish'
    tag "call ${chrom}:${start}-${end}"
    maxForks 80
    input:
        file bams from bam_list
        set chrom,start,end from split_bed.splitCsv(header: false,sep:'\t',strip:true)
    output:
        set chrom,start,end,file("${chrom}_${start}_${end}.calling.vcf.gz"),file("${chrom}_${start}_${end}.calling.vcf.gz.tbi") into called_vcfs
    script:
    """
    set -o pipefail
    ${modulesLoads()}

    test \\! -f "${workflow.workDir}/STOP"

    bcftools mpileup --regions "${chrom}:${start}-${end}" -a 'FORMAT/AD' -a 'FORMAT/DP' -a 'INFO/AD' -O u -f ${REF} --bam-list ${bams}  --output jeter.bcf
    bcftools call --ploidy GRCh37 --multiallelic-caller -O u --variants-only -o jeter2.bcf jeter.bcf
    rm jeter.bcf
    bcftools norm -f ${REF}  -O z -o "${chrom}_${start}_${end}.calling.vcf.gz" jeter2.bcf
    rm jeter2.bcf
    bcftools index --tbi --force "${chrom}_${start}_${end}.calling.vcf.gz"
    """
    }

thanks for your help

P.

Matthew Pocock
@drdozer
Mar 18 11:13
sorry for the noob question - are there pre-canned docker containers for nextflow which contain e.g. swissprot and ncbi non-redundant, ready for blasting against?
Luca Cozzuto
@lucacozzuto
Mar 18 11:14
Hi @drdozer. Is a bad practice to store large db in containers.
Normally you store programs in containers and DB are outside
Matthew Pocock
@drdozer
Mar 18 11:15
@lucacozzuto sure - but there's nothing wrong with these things being in a docker vlume, is there?
or should it always be on the host fs?
Luca Cozzuto
@lucacozzuto
Mar 18 11:18
Having very large docker images can have some issues with memory.
It is better to have them outside
Matthew Pocock
@drdozer
Mar 18 11:21
OK - understood.
Paolo Di Tommaso
@pditommaso
Mar 18 12:53
@lindenb are you using a shared file system ?
Matthew Pocock
@drdozer
Mar 18 13:01
@pditommaso Right now, for playing about, I have a single big machine that I'll be running on. But ultimately we would be running pipelines on multiple clusters, so shared filesystems will become a problem at some point
Paolo Di Tommaso
@pditommaso
Mar 18 13:01
I was asking to the other guy provided you aren't in the same org :smile:
Matthew Pocock
@drdozer
Mar 18 13:06
oops! yeah, no, not the same
Paolo Di Tommaso
@pditommaso
Mar 18 13:06
what's your org btw, if I may ask ..
Pierre Lindenbaum
@lindenb
Mar 18 13:19
@pditommaso yes
Paolo Di Tommaso
@pditommaso
Mar 18 13:19
see the lenient option here
another reason could be that's a not-deterministic channel, for example collecting results from upstream parallel processes
Pierre Lindenbaum
@lindenb
Mar 18 13:25
@pditommaso thanks, currently testing with lenient ... :clock1:
Pierre Lindenbaum
@lindenb
Mar 18 13:36
@pditommaso that worked ! many thanks ! :-)
Paolo Di Tommaso
@pditommaso
Mar 18 13:37
cool, maybe we have to promote it as default at some point
KochTobi
@KochTobi
Mar 18 14:06
@pditommaso thanks for your incredibly fast response on #1081 I will try that as soon as my current run is finished :+1:
Paolo Di Tommaso
@pditommaso
Mar 18 14:06
welcome :smile:
Maxime Garcia
@MaxUlysse
Mar 18 14:12
@pditommaso Is there any way to make the dump operator work without the -dump-channels flag ?
Paolo Di Tommaso
@pditommaso
Mar 18 14:13
how ?
Maxime Garcia
@MaxUlysse
Mar 18 14:13
Like by enabling it in script if someone use a previously made flag
I don't want to break behavior of Sarek
Paolo Di Tommaso
@pditommaso
Mar 18 14:14
why it should be broken ?
Maxime Garcia
@MaxUlysse
Mar 18 14:14
I'm replacing our own made verbose cockroach command by the dump operator
Paolo Di Tommaso
@pditommaso
Mar 18 14:15
LOL
Maxime Garcia
@MaxUlysse
Mar 18 14:15
so I want to depreciate it
Paolo Di Tommaso
@pditommaso
Mar 18 14:15
ah, because you were using a custom param
Maxime Garcia
@MaxUlysse
Mar 18 14:15
that's it
Paolo Di Tommaso
@pditommaso
Mar 18 14:15
no, now it only works with the -dump-channels flag
Maxime Garcia
@MaxUlysse
Mar 18 14:16
ok
I'll keep our own made flag, but display a message saying that it's depreciated then
Thanks for your help
Paolo Di Tommaso
@pditommaso
Mar 18 14:17
:+1:
KochTobi
@KochTobi
Mar 18 15:40
@pditommaso I am using a slurm executor. In this setting the conda environment always is created on the headnode. Is there a setting to submit the conda-env create command to slurm like any other job? (The headnode is not ment to do computing and therefore takes a longer time than the slurm compute nodes to create the environment)
Paolo Di Tommaso
@pditommaso
Mar 18 15:41
no
KochTobi
@KochTobi
Mar 18 15:41
Ok :)
Paolo Di Tommaso
@pditommaso
Mar 18 15:41
:)
Félix C. Morency
@fmorency
Mar 18 21:10

lenient becoming default

:D