These are chat archives for nextflow-io/nextflow

9th
Jun 2016
Phil Ewels
@ewels
Jun 09 2016 07:55
@pditommaso: To double check - your example above would need toList() to make the MultiQC process run on all results from those channels, rather than one at a time right?
process multiqc {
  input: 
  file ('fastqc/*') from fastqc_results.toList()
  file ('trim_galore/*') from trim_galore_results.toList()
  : 

'''
multiqc  .
'''
}
Paolo Di Tommaso
@pditommaso
Jun 09 2016 07:56
Yes, you are right. Sorry I didn't mention it
Phil Ewels
@ewels
Jun 09 2016 07:56
No problem, just wanted to check :+1:
Maxime Garcia
@MaxUlysse
Jun 09 2016 08:51
Hello and thanks again for all the help yesterday
But still having some troubles
I have a list of lists as an output from a process
something like:
[[patientID], [sampleID_2, sampleID_3, sampleID_1], [sampleID_2.md.real.bam, sampleID_3.md.real.bam, sampleID_1.md.real.bam], [sampleID_2.md.real.bai, sampleID_1.md.real.bai, sampleID_3.md.real.bai]]
and for the next process, I want to iterate over each samples with the corresponding files
so I'm thinking about just getting the sampleIDs, and recreate a new Channel with the corresponding files, recreated from theses sampleIDs
but it seems to me there is probably a better and more intelligent way to do that
I tried to use collect() or collectNested() to get the kind of output I want but I didn't manage to make it work
Maxime Garcia
@MaxUlysse
Jun 09 2016 08:57
any ideas ?
Paolo Di Tommaso
@pditommaso
Jun 09 2016 08:58
A bit tricky, but I think something like this should work
sourceChannel
   .flatMap { pid, samples, bams, bais -> 

  [ 
    [pid, sample[0], bams[0], bais[0]],
    [pid, sample[1], bams[1], bais[1]],
    [pid, sample[2], bams[2], bais[2]],    
   ]

 }
this assumes samples, bams, etc are ordered
if not you will have to order them
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:00
looks nice
Paolo Di Tommaso
@pditommaso
Jun 09 2016 09:00
I would suggest to isolate the problem in a small script and make some tests until you get right transformation
maybe using nextflow console can help
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:00
I'll look into this console
thanks for the advice
Paolo Di Tommaso
@pditommaso
Jun 09 2016 09:01
welcome
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:12
Definitively better, just need to order my samples, bams and bais, because somehow the order is lost in the process
But just a last problem, I don't have a fixed number of samples
Paolo Di Tommaso
@pditommaso
Jun 09 2016 09:13
um, you like difficult things :)
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:13
I know
Paolo Di Tommaso
@pditommaso
Jun 09 2016 09:14
but the size of bams and bais and samples match I hope
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:14
Of course, it would be problematic if it doesn't
Paolo Di Tommaso
@pditommaso
Jun 09 2016 09:15
well, you have a full featured programming lang that just wait for be used :)
write a smaller helper function that given that list reformat them as you need
def reshape( samples, bams, bais ) {
  def len = sample.size()
  for( int i=0; i<len; i++ ) { etc .. }
}
Maxime Garcia
@MaxUlysse
Jun 09 2016 09:16
good idea thanks
Hugues Fontenelle
@huguesfontenelle
Jun 09 2016 10:59
I'm here to potentially help/be helped on #176
Mike Smoot
@mes5k
Jun 09 2016 21:37
@pditommaso Hi Paolo, has anyone thought about nextflow in Jupyter?
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:43
there was a proposal from some people from university of Uppsala (if I'm not wrong), but it never materialised
Also I gave a quick look to Apache Zeppelin that it could be more nextflow friendly
Mike Smoot
@mes5k
Jun 09 2016 21:44
Haven't seen Zeppelin - will have to check that out. Here's another one: http://beakernotebook.com/
Beaker looks like it lets you use different languages simultaneously and share data between them, which is pretty interesting.
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:45
I couldn't say which is better between beaker and zeppelin
the good of zeppelin is that it's used as the interactive env by Spark
the API looks quite easy, I guess a nextflow kernel won't be too difficult to be implemented
but I have no resources for that
Mike Smoot
@mes5k
Jun 09 2016 21:48
Sure - I was just checking on what might already exist. The suggestion of nextflow in a notebook just came up in a meeting. If this is something we want to do I think we'd do our best to contribute.
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:49
I think it could have sense, the most common complain about Jupiter is the lack of support for heavy computation workloads
nextflow could bring that ability
Mike Smoot
@mes5k
Jun 09 2016 21:51
Exactly, and with a much simpler concurrency model than IPython Parallel!
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:51
yep :)
out of curiosity, what's your org?
(I think I've asked but I forgot ..)
Mike Smoot
@mes5k
Jun 09 2016 21:52
I work for Synthetic Genomics
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:53
cool
Mike Smoot
@mes5k
Jun 09 2016 21:54
they pay me to work on cool things, so I'm pretty happy!
Paolo Di Tommaso
@pditommaso
Jun 09 2016 21:54
that's the good of research ;)
interested company, you are related with JGVI
Mike Smoot
@mes5k
Jun 09 2016 22:01
We have a few collaborations with them and we share a founder, but otherwise we're separate entities.