These are chat archives for nextflow-io/nextflow

19th
Dec 2014
Michael L Heuer
@heuermh
Dec 19 2014 00:43

@pditommaso I gave a leftJoin operation a go but couldn't make it work; I also tried simulating it using map and reduce and couldn't get that to work either. Unfortunatly I'm not much of a groovy developer and the operations in DataflowExtensions.groovy+gpars stuff aren't exactly what I'm used to with RxJava and Apache Spark.

Consider it a feature request then. :)

A second question -- and perhaps I asked this before in a different way -- what is the best way to consume the same channel in multiple process blocks? I'm using
alignmentReadPairs = Channel.create()
readPairs = reads1.phase(reads2).map{ read1, read2 -> [ read1[0], read1[1], read2[1] ] }.tap(alignmentReadPairs)

process one {
  input:
    set s, file(r1), file(r2) from readPairs
  // ...
}
process two {
  input:
    set s, file(r1), file(r2) from alignmentReadPairs
  // ...
}
Paolo Di Tommaso
@pditommaso
Dec 19 2014 08:23
@heuermh OK, if you open a feature request showing an example how should work exactly the leftJoin operator, I would be happy to try to include it in the next release.
Regarding your second question, your way it fine. You can also use the into operator. For example:
reads2 = Channel.create()
reads1 = Channel.create()
Channel.from(1,2,3).into(reads1, reads2)

reads1.subscribe { println "1> $it" }
reads2.subscribe { println "2> $it" }
But it's almost identical.
Sascha Steinbiss
@satta
Dec 19 2014 14:01
hi
hmm why does this run only once for the first chunk of sequences?
if I make it each genome_chunk instead of file genome_chunk in the predict_ncRNA process I get multiple processes but also a warning ;)
Sascha Steinbiss
@satta
Dec 19 2014 14:07
ah it has to do with the second input
is there any way to use the same value from the pressed_model channel multiple times?
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:09
hi
yes, basically you need to say that are expecting you a single value
Sascha Steinbiss
@satta
Dec 19 2014 14:10
I see
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:10
you can do that writing pressed_model.first()
Sascha Steinbiss
@satta
Dec 19 2014 14:10
any hints about the most elegant way to do that?
ah
let me try...
sweet
thanks!
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:11
:)
welcome
Sascha Steinbiss
@satta
Dec 19 2014 14:12
I feel welcomed :)
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:12
of course
anyway I need to specify better that in the doc
Sascha Steinbiss
@satta
Dec 19 2014 14:12
just playing around with nxf as a possible candidate for automating an annotation pipeline
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:12
nice
Sascha Steinbiss
@satta
Dec 19 2014 14:12
very shiny! like it so far
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:13
thanks
may I know what lab/institute are you from ?
Sascha Steinbiss
@satta
Dec 19 2014 14:13
do you have any users yet who have built pipelines with web frontends?
sure. parasite genomics, sanger institute
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:14
oh, nice
regarding the web server, no
but we are planning to do that for our web server
Sascha Steinbiss
@satta
Dec 19 2014 14:14
I guess I will need job queuing, status reporting etc.
I guess the best way would be to have a status reporting channel whose consumer updates a database table...
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:15
well yes, but the job queuing would be handled by nextflow
(submitting requests to a cluster)
Sascha Steinbiss
@satta
Dec 19 2014 14:16
well, I don't mean queuing for individual processes in a job
I meant keeping apart multiple users' workflow instances ;)
canceling workflow runs, etc.
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:17
you mean, running each user request as a separate job (?)
Sascha Steinbiss
@satta
Dec 19 2014 14:18
as an individual workflow instance. my idea was to have the user upload his data/parameters, then build a configuration file using that and run the workflow with that config file
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:19
makes a lot of sense
Sascha Steinbiss
@satta
Dec 19 2014 14:19
which then submits all the processes etc.
but... multiple users may have multiple data sets to process, and so on. so one needs to take care of this one level higher
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:20
I see
Sascha Steinbiss
@satta
Dec 19 2014 14:20
well, I will come up with something once I get to that stage... it's already very useful to have a proper pipeline engine instead of shell scripts to run in a specific order ;)
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:20
yep
feel free to post any doubt to the nextflow forum
Sascha Steinbiss
@satta
Dec 19 2014 14:22
absolutely! thanks for the help
Paolo Di Tommaso
@pditommaso
Dec 19 2014 14:22
thanks for your feedback
it's very appreciated
cheers, p