These are chat archives for nextflow-io/nextflow

4th
Aug 2015
Sascha Steinbiss
@satta
Aug 04 2015 08:50
hey... are there any plans to implement reusing of temporary files...?
Sascha Steinbiss
@satta
Aug 04 2015 09:04
I'm using spread() with protein/genome chunks, some of which can be multiple chromosomes:
exn_prot_chunk = ref_pep.splitFasta( by: 100)
exn_genome_chunk = pseudochr_seq_exonerate.splitFasta( by: 3)
process run_exonerate {
    cache 'deep'
    // this process can fail for rogue exonerate processes
    errorStrategy 'ignore'

    input:
    set file('genome.fasta'), file('prot.fasta') from pseudochr_seq_exonerate.spread(exn_prot_chunk)

    output:
    file 'exn_out' into exn_results

    """
    exonerate -E false --model p2g --showvulgar no --showalignment no \
      --showquerygff no --showtargetgff yes --percent 80 \
      --ryo \"AveragePercentIdentity: %pi\n\" prot.fasta \
       genome.fasta > exn_out
    """
}
and that really blows up my scratch directory ;)
there's a separate copy of the genome chunks for each job. would be nice to have them symlinked to one shared copy etc
Paolo Di Tommaso
@pditommaso
Aug 04 2015 10:16
@satta Actually it is possible
Sascha Steinbiss
@satta
Aug 04 2015 10:16
ah, cool!
Paolo Di Tommaso
@pditommaso
Aug 04 2015 10:17
by default splitFasta create a chunks in memory, thus then the process will save them to files
but you can specify splitFasta( by: x, file: true)
by doing that chunks will be saved to files and symlinked by the process
Sascha Steinbiss
@satta
Aug 04 2015 10:18
perfect!
Paolo Di Tommaso
@pditommaso
Aug 04 2015 10:18
it should also reuse them when resuming the execution
Sascha Steinbiss
@satta
Aug 04 2015 10:18
thanks, will try that right away
Paolo Di Tommaso
@pditommaso
Aug 04 2015 10:19
ok, let me know
Sascha Steinbiss
@satta
Aug 04 2015 10:19
great! now I just need to delete all the existing temp stuff on lustre to make room ;)
will do
Paolo Di Tommaso
@pditommaso
Aug 04 2015 10:19
:)