These are chat archives for nextflow-io/nextflow

29th
Jan 2015
Andrew Stewart
@andrewcstewart
Jan 29 2015 00:52
process foo {

  output:
    stdout myChan

"""
!/usr/bin/env python

print [1,2,3]
"""
}

process bar {

  input:
    val myChan

"""
something $val
"""
}
Sascha Steinbiss
@satta
Jan 29 2015 09:05
@pditommaso thanks! I have already seen these examples
but what I sometimes want is something like:
process run_augustus {
    if (params.run_exonerate) {
        input:
        file 'augustus.hints' from exn_hints
        file 'pseudo.pseudochr.fasta' from pseudochr_seq
    } else {
        input:
        file 'pseudo.pseudochr.fasta' from pseudochr_seq
    }

    def hintsfile = ""
    if (params.run_exonerate) {
        hintsfile = "--hintsfile=augustus.hints"
    }

    output:
    file 'augustus.gff3' into augustus_gff3
    stdout into statuslog

    """
    export AUGUSTUS_CONFIG_PATH=${params.AUGUSTUS_CONFIG_PATH}
    ${params.AUGUSTUS_DIR}/augustus \
        --species=${params.AUGUSTUS_SPECIES} \
        --stopCodonExcludedFromCDS=false \
        --protein=off --codingseq=off --strand=both \
        --genemodel=${params.AUGUSTUS_GENEMODEL} --gff3=on \
        ${hintsfile} \
        --noInFrameStop=true \
        --extrinsicCfgFile=extrinsic.cfg \
        pseudo.pseudochr.fasta > augustus.full.tmp
    augustus_to_gff3.lua < augustus.full.tmp \
        | ${params.GT_DIR}/bin/gt gff3 -sort -tidy -retainids \
        > augustus.full.tmp.2
    augustus_mark_partial.lua augustus.full.tmp.2 > augustus.gff3
    """
}
otherwise I would have to duplicate code to support this process with and without hints, right?
(I have already put the process creating exn_hints into an if condition)
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:37
@andrewcstewart The stdout allows you to capture the task standard output, so whatever data structure you use the resulting channel will always emit a string value.
@andrewcstewart If you want to manage like a list you will have to convert to a list, something like myChan.map { it.trim().tokenize(',') }
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:42
@andrewcstewart though you should also remove the square brackets and the beginning and the ending of that string ..
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:49
@satta I see, basically you have a conditional input
I don't think that your code is going to work
Sascha Steinbiss
@satta
Jan 29 2015 13:50
it doesn't ;)
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:51
yes, it has not been designed to work in that way
Sascha Steinbiss
@satta
Jan 29 2015 13:51
I think for now I will work around that by having two processes creating augustus_gff3 and enable/disable them with an if-condition around them
but it would have been nice not having to duplicate code
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:52
the idea is the process define clearly in the inputs/outputs interface of task
I think you could also manage in this way
ah no, I was proposing a bad idea
let me think a bit about your case
yes, the ideal solution should not duplicate the code
however I would suggest to configure your PATH by using the nextflow config file instead of using the ${params.AUGUSTUS_DIR} in your task
Sascha Steinbiss
@satta
Jan 29 2015 13:56
right
will do that. thanks :)
btw, another thing I find weird to do right now: using a file created in one process in many others downstream...
currently I have to do:
pseudochr_seq_tRNA = Channel.create()
pseudochr_seq_ncRNA = Channel.create()
pseudochr_seq_exonerate = Channel.create()
pseudochr_seq_augustus = Channel.create()
pseudochr_seq_snap = Channel.create()
pseudochr_seq_augustus_ctg = Channel.create()
pseudochr_seq_make_gaps = Channel.create()
pseudochr_seq_dist = Channel.create()
pseudochr_seq_tmhmm = Channel.create()
pseudochr_seq_orthomcl = Channel.create()
pseudochr_seq_splitsplice = Channel.create()
pseudochr_seq.separate(pseudochr_seq_tRNA, pseudochr_seq_ncRNA,
                       pseudochr_seq_augustus, pseudochr_seq_augustus_ctg,
                       pseudochr_seq_snap, pseudochr_seq_dist,
                       pseudochr_seq_make_gaps, pseudochr_seq_splitsplice,
                       pseudochr_seq_tmhmm, pseudochr_seq_orthomcl,
                       pseudochr_seq_exonerate) { a -> [a, a, a, a, a, a, a, a, a, a, a]}
to be able to use them as inputs
sure there has to be a better way...
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:58
yes, it turns out that this is the most requested enhancement
Sascha Steinbiss
@satta
Jan 29 2015 13:58
:)
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:59
Surely, there will be an improvement regarding this in next releases
Sascha Steinbiss
@satta
Jan 29 2015 13:59
great! nice to hear that
Paolo Di Tommaso
@pditommaso
Jan 29 2015 13:59
coming back to your original question, you could manage in this way
Sascha Steinbiss
@satta
Jan 29 2015 13:59
listening
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:00
let the channel exn_hints a pair of values instead of simply a file
the first value could be the condition true/false
the the second value the actual file
Sascha Steinbiss
@satta
Jan 29 2015 14:01
ah I see... then only use the file if the first value is true
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:01
yes
you can move the condition in the script part
Sascha Steinbiss
@satta
Jan 29 2015 14:02
and in the case without hints just have a `Channel.from([false, ...]) etc.
anyway, I get the idea... thanks!
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:03
how many times you need to run run_augustus?
Sascha Steinbiss
@satta
Jan 29 2015 14:03
just once
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:04
So use ``Channel.just([false, ..])
Sascha Steinbiss
@satta
Jan 29 2015 14:04
cool
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:05
not Channel.from, because the latter transform each list item into a single emission
Sascha Steinbiss
@satta
Jan 29 2015 14:05
ah, right. because it's just one parameter
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:05
yes
Sascha Steinbiss
@satta
Jan 29 2015 14:05
I see. sounds good, will try that later
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:06
ok, but remember to declare the input with a setqualifier, is that clear?
for example:
Sascha Steinbiss
@satta
Jan 29 2015 14:10
yes that's clear. I used that before
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:10
input:
 set run_exonerate, file('exn_hints') from exn_hints
ah, ok if so
Sascha Steinbiss
@satta
Jan 29 2015 14:10
    exn_prot_chunk = ref_pep.splitFasta( by: 20)
    exn_genome_chunk = pseudochr_seq_exonerate.splitFasta( by: 3)
    process run_exonerate {
        input:
        set file('genome.fasta'), file('prot.fasta') from exn_genome_chunk.spread(exn_prot_chunk)
...
}
works nicely!
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:11
Great!
Sascha Steinbiss
@satta
Jan 29 2015 14:14
final question: suppose I want to find out if a process has completed while the whole workflow is running
the trace.csv is a way to do that, right? or have a statuslog channel which takes log messages I send out manually
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:17
The problem with the trace.csv is that is not written immediately for perf reasons
Sascha Steinbiss
@satta
Jan 29 2015 14:17
hmmm
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:18
you should try to trace the log but is not a great solution
Sascha Steinbiss
@satta
Jan 29 2015 14:18
is there a way to hook into start and end of a process?
what I do now is do a statuslog.bind("ncRNA detection started") befor the script part
but I haven't found a way to capture the end unless doing output: stdout into statuslog and then echo'ing a message after the script
but that's rather hackish
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:19
You should create a output dedicated to monitor the process execution
Yes, I think this part has to be improved
Sascha Steinbiss
@satta
Jan 29 2015 14:21
thanks
Paolo Di Tommaso
@pditommaso
Jan 29 2015 14:21
welcome
Sascha Steinbiss
@satta
Jan 29 2015 15:58
hmmm. I notice that in processes taking output from .collectFile() as input, nothing is cached
could that be due to slight differences in the result, leading to different result hashes?
could it help to canonicalize the input?
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:07
This is a known problem that has to be addressed
for now a trick is using the cache directive
Sascha Steinbiss
@satta
Jan 29 2015 17:08
ah ok
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:08
wait no
Sascha Steinbiss
@satta
Jan 29 2015 17:08
kk
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:09
ah, OK
yes, using `cache 'deep'
will solve the problem
because by default hash key for files are created by using the file path, size and last modified time
Sascha Steinbiss
@satta
Jan 29 2015 17:11
nice, thanks -- will try that and re-run. that step takes ~1h without caching
ah and 'deep' will actually md5sum/sha the file
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:12
yes
Sascha Steinbiss
@satta
Jan 29 2015 17:12
I see
still, I will have to sort the files somehow to make up for out of order results, right?
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:13
that's not necessary
Sascha Steinbiss
@satta
Jan 29 2015 17:13
ok
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:13
collectFile sort the content automatically, so it is consistent
Sascha Steinbiss
@satta
Jan 29 2015 17:13
cool! thanks :)
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:13
happy to help
Sascha Steinbiss
@satta
Jan 29 2015 17:13
that's enough questions for today
Paolo Di Tommaso
@pditommaso
Jan 29 2015 17:13
:)
enjoy
Andrew Stewart
@andrewcstewart
Jan 29 2015 22:55
is it possible to pull a pipeline from bitbucket using deployment keys instead of user name?