These are chat archives for nextflow-io/nextflow

6th
Oct 2016
amacbride
@amacbride
Oct 06 2016 18:37
Hi @pditommaso -- I'm seeing intermittent issues with missing output files (that are actually there). I saw this thread from a few months back (http://www.gitterforum.com/discussion/nextflow-io-nextflow?page=196), regarding Docker, but I'm still seeing it after upgrading to the latest version.
This is in a cluster formation, with a work dir on NFS, using SLURM and Docker. Any suggestions?
It's both intermittent and non-deterministic.
amacbride
@amacbride
Oct 06 2016 18:51
process counts {
    echo true
    // counts by gene_id 
    tag { sample_name }
    cpus 1
    //memory "5 G"
    clusterOptions "--mem=5120"

    storeDir { "${params.results}/$runid/$sample_name/exp" }

    //publishDir { "${params.results}/$sample_name/exp" }

    input:
        set sample_name, file(bam) from sorted1
        file resources

    output:
        set sample_name, file('counts.txt') into counts

    script:
        template "counts.sh"
}


----

#!/bin/bash

set -x
{

$params.htseq_path/htseq-count \
    ${params.config.counts.htseqCountArgs} \
    ${bam} \
    ${resources}/Homo_sapiens.GRCh37.75.gtf > counts.txt

} 2>&1
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:01
do u mean the job completes successfully but it stop because it cannot find the expected output files?
amacbride
@amacbride
Oct 06 2016 19:31

Correct.

```

Missing output file(s) `counts.txt` expected by process `counts (D05_313-1-014-TBB-ABCDE)
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:32
um
amacbride
@amacbride
Oct 06 2016 19:32
It's not just this particular step, but it's representative. I've never seen this particular error with local disk, or when using SLURM in a single-node config.
And when I look in the work dir, the file is always there.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:33
is it happening also with other NF pipelines ?
amacbride
@amacbride
Oct 06 2016 19:34
I'm only working on this one NF pipeline, so I don't have any other examples.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:35
I see
you may want to try with this one for example
it should be enough to run
amacbride
@amacbride
Oct 06 2016 19:36
One question I had was the timing of when NF checks for the existence of the output file. Is it asynchronous with respect to the execution of the subshell, or synchronous?
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:36
nextflow run cbcrg/shootstrap -with-docker -process.executor slurm
synchronous
you can try to set process.scratch = true
amacbride
@amacbride
Oct 06 2016 19:39
I will try that as well -- I was interested in teh ram-disk option, but had shied away since it was marked as experimental.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:39
me as well, but I've never managed to stress test it
amacbride
@amacbride
Oct 06 2016 19:40
I'll let you know what I find out -- thanks!
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:40
ok
welcome
amacbride
@amacbride
Oct 06 2016 19:41
If I'm putting it in a config file, would this be correct?
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:41
yes
amacbride
@amacbride
Oct 06 2016 19:41
process {
    executor = "slurm"
    queue = "clia"
    scratch = "true"
}
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:41
without quotes, its a bool
amacbride
@amacbride
Oct 06 2016 19:42
gotcha
amacbride
@amacbride
Oct 06 2016 19:57
Does NF clean up each of the tmpdir folders as each process exits?
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:58
nope
amacbride
@amacbride
Oct 06 2016 19:58
so everything would just be left in /tmp, as if it were the work dir?
Paolo Di Tommaso
@pditommaso
Oct 06 2016 19:59
are u referring to scratch, right?
amacbride
@amacbride
Oct 06 2016 19:59
Yes.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 20:00
the temp data is supposed to be created in the scratch area provided by the cluster via TMPDIR
(default)
thus it should be removed by SLURM
though I'm not 100% sure that SLURM use that var name
amacbride
@amacbride
Oct 06 2016 20:15
Ah, OK. I took a look at the generated shell scripts, and it doesn't look like the scratch dir is removed. In a future release, it might be nice to have that as an option, since once the outputs are copied at the end of the process, the scratch dir could be deleted.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 20:15
make sense
you may want to open a feature request for that
Paolo Di Tommaso
@pditommaso
Oct 06 2016 20:23
:+1:
amacbride
@amacbride
Oct 06 2016 20:23
Just did, though I wasn't sure how to tag it as an enhancement.
Paolo Di Tommaso
@pditommaso
Oct 06 2016 20:23
that's perfect, thanks
amacbride
@amacbride
Oct 06 2016 23:31
Dumb question: I cloned the nextflow repo so that I could do some in-depth debugging. But when I follow the build instructions, it seems to nuke my local changes. (Example: I modified ./src/main/groovy/nextflow/Const.groovy to just change the app version so that I could verify that I'm running with my modified sources. But after doing make compile ; make pack; make install the file is back to the unmodified version.)
I'm not familiar with Gradle, but I'm obviously missing something... :)
amacbride
@amacbride
Oct 06 2016 23:55
D'oh. Figured it out -- didn't realize that file was autogenerated.