These are chat archives for nextflow-io/nextflow

Jan 2017
Jan 27 2017 00:37
Hi all. @pditommaso , I had a question about storeDir vs. publishDir. When using a multi-machine SLURM cluster, I'm seeing intermittent process failures where NF doesn't seem to see the output file for a process, even though it's there. (I have an ls -alt on the expected output file as part of the process, and it's there, but the process fails anyway.)
process concatJuctions {
    echo true 
    tag { sample_name }
    cpus 1 
    memory "${config.clustermem.concatJuctions}"
    storeDir { "${params.results}/$runid/$sample_name/fus" }
    //publishDir { "${params.results}/$runid/$sample_name/fus" }

        set sample_name, file('chimeric_junctions') from fusion_jxns.groupTuple();

        set sample_name, file("chimeric_junctions_merged") into junctions

        params.product == "product2"

        template ""
How does NF determine if the output file of a process is there? I need the output of this process later on (so I heeded the warning not to use publishDir because it's asynchronous), but I'm wondering why NF doesn't think the file is there.
Jan 27 2017 00:44
Command output:

  concatJuctions: C12_B-RNA-011
  + echo 'concatJunctions step, chimeric_junctions'
  concatJunctions step, chimeric_junctions
  + /usr/local/setup/nextflow/bin/ chimeric_junctions1 chimeric_junctions2 chimeric_junctions3 chimeric_junctions4

  Concatenated STAR junctions: chimeric_junctions_merged
        concatenate: chimeric_junctions1
        concatenate: chimeric_junctions2
        concatenate: chimeric_junctions3
        concatenate: chimeric_junctions4
  + ls -alt chimeric_junctions_merged
  -rw-r--r-- 1 ngsdaemon ngs_automation 8857804 Jan 27 00:09 chimeric_junctions_merged
  + sleep 30
Any suggestions would be appreciated.
Jan 27 2017 01:06
I'm using 'scratch', in case that sheds any light on it.
Paolo Di Tommaso
Jan 27 2017 13:12
NF determine the job completion by checking the existence of a file named .exitcode created by the job itself.
most likely this happens due to a long latency in your shared file system ie. files are created by a remote computing node, but they are not made immediately available through by shared file system to the node the NF application is running.
Paolo Di Tommaso
Jan 27 2017 13:18
What file system are u running in your cluster?
you can check with this command: stat -f -c %T /your/shared/path
Trevor Tanner
Jan 27 2017 18:03
When using NF w/SLURM, does NF adjust the maxForks value based on every sbatch job assignment? I noticed this line in the log Jan-27 10:39:52.975 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > extractSRA -- maxForks: 24 and was just curious because some of the nodes in the partition have 24 threads, while others have 64 threads
Paolo Di Tommaso
Jan 27 2017 18:04
nope, that value depends on the number of cpus on the driver node ie where NF is launched
Trevor Tanner
Jan 27 2017 18:07
ah gotcha, thanks. do the specs of the driver node matter much? i.e. is a single-core just as good as multi-core for 1000s of tasks?
Paolo Di Tommaso
Jan 27 2017 18:09
well, as usual the bigger is better :)
though is not eager of resources some activities are improved with a multi-cpus, like for example publishDir
Trevor Tanner
Jan 27 2017 18:14
haha, I figured that was probably the case :smile: good to know - I have to give it a minimum of 8 cores anyhow (no node sharing at our center), so hopefully that'll do the trick
Jan 27 2017 23:20
Hmm. The latency should be very small -- it's a high-end ZFS box serving NFS over a 10Gig ethernet link. Thanks for the info, I'll keep poking around.
Jan 27 2017 23:31
I don't think it's related, but I am seeing this error on 0.23.1:
Command error: line 88: nxf_kill: command not found