These are chat archives for nextflow-io/nextflow

10th
Apr 2018
Brad Langhorst
@bwlang
Apr 10 2018 03:09
i'm trying( https://gist.github.com/bwlang/29ebe8044b04579522989ac921fa7bb3) to implement a suggestion by @mes5k to process tiles in batches, but not with success yet. Can anybody point me on the right path?
the batching seems to be working as I want, but i can't find a good way to consume the set of fastqs downstream.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 06:34
@bwlang I've commented on your gist
Vladimir Kiselev
@wikiselev
Apr 10 2018 10:51
Hi Paolo, is there any way to expose kubernetes secrets to a pod running a NF pipeline?
Brad Langhorst
@bwlang
Apr 10 2018 10:53
I've updated my gist with more output... thanks for any hints.
Vladimir Kiselev
@wikiselev
Apr 10 2018 11:02
Hi @pditommaso, is there any way to expose kubernetes secrets to a pod running a NF pipeline?
Paolo Di Tommaso
@pditommaso
Apr 10 2018 11:33
I don't think so, but it would be interesting to learn more about your use case, please fill out a issue on github, and let's continue the discussion there
Paolo Di Tommaso
@pditommaso
Apr 10 2018 11:41
@bwlang ok, but I've missing what you are exactly trying to do?
do you need to batch together all the read pairs in the library ?
Brad Langhorst
@bwlang
Apr 10 2018 12:44
hi @pditommaso : I’m trying to batch up a set of per-tile fastq files for alignment (dynamically based on their sizes) once i have it working I’ll use seqtk mergepe <(zcat !{read1fqs.flatten()}) <(zcat !{read2fqs.flatten()}) | ...
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:44
my handle is @pditommaso :)
:+1:
Brad Langhorst
@bwlang
Apr 10 2018 12:44
too quick on the enter key ;)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:45
I see
Brad Langhorst
@bwlang
Apr 10 2018 12:45
maybe i’m making this too hard… or doing it at the wrong place?
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:45
I need to understand better
so you have all these records
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1101, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1101/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1101/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5170772], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1102, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1102/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1102/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5234610], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1103, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1103/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1103/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5093533], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1104, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1104/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1104/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5144167], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1105, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1105/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1105/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5017766], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1106, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1106/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1106/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5120467], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1107, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1107/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1107/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:5035847], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1108, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1108/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1108/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:4996449], 
[library:NoBS-1000ng-1, flowcell:HCVMLDMXX, lane:1, tile:1109, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1109/L2_NoBS-1000ng-1.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1109/L2_NoBS-1000ng-1.3.fastq.gz, read1_bytes:4937475], 
[library:NoBS-1000ng-2, flowcell:HCVMLDMXX, lane:1, tile:1101, read1:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1101/L2_NoBS-1000ng-2.1.fastq.gz, read2:/mnt/galaxy/novaseq_staging/180403_A00336_0024_BHCVMLDMXX//fastq/L_1_1101/L2_NoBS-1000ng-2.3.fastq.gz, read1_bytes:4575688]
Brad Langhorst
@bwlang
Apr 10 2018 12:47
i have a csv file containing 32k entries (1 for each libary, lane, flowcell and tile). I want to group those by library and align 100M of fastq at once). right now i do one job per tile which is too short (loading the index is most of the execution time)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:47
I'm understanding that, therefore you will have have task execution handling many read pairs
what about the libraries, flowcell, lane and tile identifier
they can differ in the same run ?
Brad Langhorst
@bwlang
Apr 10 2018 12:49
flowcell and library dont, but lane and tile do.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:49
lane and tile do.
how will you use different lane / tile in the same command?
more easy
Brad Langhorst
@bwlang
Apr 10 2018 12:50
i’ll use the fastqs that are generated per lane/tile (1 fastq pair for each library, lane and tile)
sorry if i’m not being clear… i can try to draw something...
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:52
I think I've understood, but if you an example of the command would be better
Brad Langhorst
@bwlang
Apr 10 2018 12:52
i’m running this now for a single read pair:
seqtk mergepe !{read1} !{read2} \
    | trimadap -5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -3 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -3 ATCTCGTATGCCGTCTTCTGCTTG -3 CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3 CTGTCTCTTATACACATCTGACGCTGCCGACGA 2> !{library}_!{flowcell}_!{lane}_!{tile}.log.trim \
    | bwa mem -p -t !{task.cpus} -R"@RG\tID:!{library}\tSM:!{library}" !{genome} - 2> !{library}_!{flowcell}_!{lane}_!{tile}.log.bwamem \
    | sambamba view -t 2 -S -f bam -o !{library}_!{flowcell}_!{lane}_!{tile}.aln.bam /dev/stdin;
id like to make that:
seqtk mergepe <(zcat !{read1s.flatten()}) <(zcat !{read2s.flatten()}) \
    | trimadap -5 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -3 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -3 ATCTCGTATGCCGTCTTCTGCTTG -3 CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -3 CTGTCTCTTATACACATCTGACGCTGCCGACGA 2> !{library}_!{flowcell}_!{lanes}_!{tiles}.log.trim \
    | bwa mem -p -t !{task.cpus} -R"@RG\tID:!{library}\tSM:!{library}" !{genome} - 2> !{library}_!{flowcell}_!{lanes}_!{tiles}.log.bwamem \
    | sambamba view -t 2 -S -f bam -o !{library}_!{flowcell}_!{lanes}_!{tiles}.aln.bam /dev/stdin;
need to eliminate some of those variabls from output log file names too… but does that convey the idea?
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:54
.flatten() is not needed
but in this command you will have may read pairs and a single tile / lane ..
this is confusing me
Brad Langhorst
@bwlang
Apr 10 2018 12:56
this will have many tiles and possibly many lanes too.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:56
how do you specify that in this file name ?
!{library}_!{flowcell}_!{lanes}_!{tiles}.aln.bam
Brad Langhorst
@bwlang
Apr 10 2018 12:57
right.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 12:57
sorry I have a meeting now
I should come back in ~ 20/30 mins
we'll continue later
Brad Langhorst
@bwlang
Apr 10 2018 12:57
or just i could come up with some kind of unique id for lanes an tiles, don’t really care since I’ll merge these later in the process.
thanks for your help! I’ll be around later too.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:29
would not be easier just execute a task per sample ?
Brad Langhorst
@bwlang
Apr 10 2018 13:52
@pditommaso in some cases I have a runs with billions of reads and 1 or 2 samples. This is not very parallelized (or friendly to wall time limits on our cluster)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:52
billions
just to say you have many or it's a real number ?
Brad Langhorst
@bwlang
Apr 10 2018 13:54
real number
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:54
wow big
Brad Langhorst
@bwlang
Apr 10 2018 13:54
1 or 2 right now.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:54
2 billion sample ! wow
Brad Langhorst
@bwlang
Apr 10 2018 13:54
so low billions.
;)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:54
cool
Brad Langhorst
@bwlang
Apr 10 2018 13:55
i remember when millions seemed ridiculously huge too.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:55
ok, so need to batch it :)
Brad Langhorst
@bwlang
Apr 10 2018 13:56
yeah - it’s working nicely now with per-tile fastqs -18 hours - but i think it can be significantly improved by increasing per-job run time.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 13:57
the current implementation is already NF based ?
Brad Langhorst
@bwlang
Apr 10 2018 13:59
yep… mostly - we’re migrating the rest and hope to present it at BOSC.
in production it’s 2 phases (fastq generation using picard and shell, followed by NF for alignment etc.)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 14:00
it’s working nicely now with per-tile fastqs
I mean it's that part in NF ?
Brad Langhorst
@bwlang
Apr 10 2018 14:01
yep - the fastq pair bits are working great.
back in a few...
Paolo Di Tommaso
@pditommaso
Apr 10 2018 14:12
let me try to sketch an implementation
Michael L Heuer
@heuermh
Apr 10 2018 14:45
@bwlang are there any public datasets similar to what you are working with?
Brad Langhorst
@bwlang
Apr 10 2018 14:51
@heuermh we’ll make some of our validation data public in the next few months (methylation detection , not standard genome). I suspect there is other novaseq data public now - but I don’t know anything specific.
Davis McCarthy
@davismcc
Apr 10 2018 15:28
Hi nextflow-ers. I'm new to nf, transitioning over from snakemake. I've got up and running v nicely with Phil Ewel's cookiecutter template, but I'm getting the following error at the very end of the run: ERROR ~ Failed to invokeworkflow.onCompleteevent handler - what am i missing here?
Paolo Di Tommaso
@pditommaso
Apr 10 2018 15:29
likely there's an error in the generated error
how is the complete error stack trace
Davis McCarthy
@davismcc
Apr 10 2018 15:30
I should look in .nextflow.log?
Paolo Di Tommaso
@pditommaso
Apr 10 2018 15:30
yep
Davis McCarthy
@davismcc
Apr 10 2018 15:32
I think the relevant snippet is:
Apr-10 16:09:38.591 [main] ERROR nextflow.script.WorkflowMetadata - Failed to invoke `workflow.onComplete` event handler
java.lang.NullPointerException: Cannot set property 'Nextflow Build' on null object
        at org.codehaus.groovy.runtime.NullObject.setProperty(NullObject.java:80)
        at org.codehaus.groovy.runtime.InvokerHelper.setProperty(InvokerHelper.java:197)
        at org.codehaus.groovy.runtime.DefaultGroovyMethods.putAt(DefaultGroovyMethods.java:272)
        at org.codehaus.groovy.runtime.dgm$509.invoke(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoMetaMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:251)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:71)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
        at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:35)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)
        at _nf_script_c3b707bb$_run_closure8.doCall(_nf_script_c3b707bb:254)
        at _nf_script_c3b707bb$_run_closure8.doCall(_nf_script_c3b707bb)
[...]
Paolo Di Tommaso
@pditommaso
Apr 10 2018 15:33
is this code generated by cookiecutter ?
I mean, what's the code in the onComplete part ?
Davis McCarthy
@davismcc
Apr 10 2018 15:34
i see what you mean
yes, this is code generated by cookiecutter:
workflow.onComplete {

    // Set up the e-mail variables
    def subject = "[davismcc/nf-hipsci-fibro] Successful: $workflow.runName"
    if(!workflow.success){
      subject = "[davismcc/nf-hipsci-fibro] FAILED: $workflow.runName"
    }
    def email_fields = [:]
    email_fields['version'] = params.version
    email_fields['runName'] = custom_runName ?: workflow.runName
    email_fields['success'] = workflow.success
    email_fields['dateComplete'] = workflow.complete
    email_fields['duration'] = workflow.duration
    email_fields['exitStatus'] = workflow.exitStatus
    email_fields['errorMessage'] = (workflow.errorMessage ?: 'None')
    email_fields['errorReport'] = (workflow.errorReport ?: 'None')
    email_fields['commandLine'] = workflow.commandLine
    email_fields['projectDir'] = workflow.projectDir
    email_fields['summary'] = summary
    email_fields['summary']['Date Started'] = workflow.start
    email_fields['summary']['Date Completed'] = workflow.complete
    email_fields['summary']['Pipeline script file path'] = workflow.scriptFile
    email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId
    if(workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository
    if(workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId
    if(workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision
    if(workflow.container) email_fields['summary']['Docker image'] = workflow.container
    email_fields['software_versions'] = software_versions
    email_fields['software_versions']['Nextflow Build'] = workflow.nextflow.build
    email_fields['software_versions']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp

    // Render the TXT template
    def engine = new groovy.text.GStringTemplateEngine()
    def tf = new File("$baseDir/assets/email_template.txt")
    def txt_template = engine.createTemplate(tf).make(email_fields)
    def email_txt = txt_template.toString()

    // Render the HTML template
    def hf = new File("$baseDir/assets/email_template.html")
    def html_template = engine.createTemplate(hf).make(email_fields)
    def email_html = html_template.toString()

    // Render the sendmail template
    def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir" ]
    def sf = new File("$baseDir/assets/sendmail_template.txt")
    def sendmail_template = engine.createTemplate(sf).make(smail_fields)
    def sendmail_html = sendmail_template.toString()

    // Send the HTML e-mail
    if (params.email) {
        try {
          if( params.plaintext_email ){ throw GroovyException('Send plaintext e-mail, not HTML') }
          // Try to send HTML e-mail using sendmail
          [ 'sendmail', '-t' ].execute() << sendmail_html
          log.info "[davismcc/nf-hipsci-fibro] Sent summary e-mail to $params.email (sendmail)"
        } catch (all) {
          // Catch failures and try with plaintext
          [ 'mail', '-s', subject, params.email ].execute() << email_txt
          log.info "[davismcc/nf-hipsci-fibro] Sent summary e-mail to $params.email (mail)"
        }
    }

    // Write summary e-mail HTML to a file
    def output_d = new File( "${params.outdir}/Documentation/" )
    if( !output_d.exists() ) {
      output_d.mkdirs()
    }
    def output_hf = new File( output_d, "pipeline_report.html" )
    output_hf.withWriter { w -> w << email_html }
    def output_tf = new File( output_d, "pipeline_report.txt" )
    output_tf.withWriter { w -> w << email_txt }

    log.info "[davismcc/nf-hipsci-fibro] Pipeline Complete"

}
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:04
@davismcc sorry I was interrupted
Davis McCarthy
@davismcc
Apr 10 2018 16:05
no worries at all!
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:05
so it break at this line
  email_fields['software_versions']['Nextflow Build'] = workflow.nextflow.build
But I don't know why
quick workaround delete all the onComplete stuff
that will disable the mail notification
Davis McCarthy
@davismcc
Apr 10 2018 16:07
OK, cool. I'll disable that stuff for now and come back to that later. Thanks so much for your very prompt help!
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:08
BTW nextflow has a built-in notification that does not require all that mess
Davis McCarthy
@davismcc
Apr 10 2018 16:08
very nice
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:08
for the specific error, I would suggest to ping the nf-core folks https://gitter.im/nf-core/Lobby
or report an issue on the cookiecutter github repo
they will happy to help you ;)
Davis McCarthy
@davismcc
Apr 10 2018 16:09
Thanks! I'll ping them :)
Edgar
@edgano
Apr 10 2018 16:10
@davismcc it could be that you delete the get_software_versions process ?
in the cookie template there is a process to capture the SW version. if you delete it... I think this is the variable is giving you the error
Davis McCarthy
@davismcc
Apr 10 2018 16:11
thanks, @edgano for the suggestion, but I haven't deleted the get_software_versions process; that's still there unchanged from cookiecutter
Edgar
@edgano
Apr 10 2018 16:12
ok, then cookiecutter crew :P
Davis McCarthy
@davismcc
Apr 10 2018 16:13
:+1:
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:16
@bwlang iI gave a try but I think the best approximation is using the number of file instead of size, could that be a temporary solution ?
something like this
count=0

Channel
  .fromPath('data.csv')
  .splitCsv(skip:1)
  .map { library, flowcell, lane, tile, read1, read2 -> tuple(library, flowcell, lane, tile, file(read1), file(read2)) }
  .groupTuple( by:2, remainder:true, size:4 )
  .println { "${++count} >> $it \n" }
Brad Langhorst
@bwlang
Apr 10 2018 16:19
@pditommaso : thanks so much for spending time on this! I could determine the number of libraries and make a guess about pooling evenness to get a prediction of how fastqs to put together. I’ll try that if I can’t get the size bits working
Mike Smoot
@mes5k
Apr 10 2018 16:30
@bwlang I haven't looked at your gist too deeply, but I'm guessing that if the buffering is working, then the problem is with runnable_mapping.reduce. I'm pretty sure you don't want reduce there because that consumes the whole channel. Instead I'm guessing you'll want a map in which you'll operate on the list of maps that buffer creates. And now off to a meeting...
Brad Langhorst
@bwlang
Apr 10 2018 16:32
@mes5k thanks for looking - i’ve since tried to reduce in a follow on step, I’ll post an update and try to understand how map could help. I think i’m getting close (or maybe that’s just a story I tell myself to keep motivated ;)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:41
The main problem with your snippet is that you are working on associative arrays, instead tuple
That's make difficult to handle then with the process
Brad Langhorst
@bwlang
Apr 10 2018 16:42
hmm - maybe I can convert as you have done in your example above. I’ll try that.
Paolo Di Tommaso
@pditommaso
Apr 10 2018 16:42
IMO the groupTuple could be an approximation of what you are trying to do
The only problem it works with the number of samples instead of file size limit
But I would be happy to extend it to support the batching based on file size
Let me know if with the file number works, then we can extend it to sopportare the file size
Mike Smoot
@mes5k
Apr 10 2018 17:13
@pditommaso I think making better use of associative arrays in nextflow channel operators and processes would be good project for nextflow++.
After modules, though. :)
Paolo Di Tommaso
@pditommaso
Apr 10 2018 17:13
we talking exactly about that with @skptic today !
Mike Smoot
@mes5k
Apr 10 2018 17:38

@bwlang is this what you want:

total = 0

def check(f) {
    if ((f.size() + total) > 1000) {
        total = 0
        return true
    } else {
        total += f.size()
        return false
    }
}

Channel
    .fromPath("simple.csv")
    .splitCsv(header: true)
    // convert to a tuple
    .map{ [it['name'], file(it['file']), it['more']] }
    // buffer on file size
    .buffer{ check(it[1]) }
    // transpose lists
    .map{ it.transpose() }
    .set{ in_ch }

process do_stuff {

    input:
    set name_list, file_list, more_junk from in_ch

    output:
    stdout into results

    script:
    """
    echo ${file_list}
    """
}

results.view()

Where the csv file looks like:

name,file,more
a,buffer.nf,asdf
b,buffer_size.nf,asdf
c,channelyaml.nf,asdf
d,david_into.nf,asdf
e,dup.nf,asdf
f,file_groovy.nf,asdf
g,filter_dir.nf,asdf
h,group_triple.nf,asdf
i,indexing_list.nf,asdf
Paolo Di Tommaso
@pditommaso
Apr 10 2018 17:39
:clap: :clap: :clap:
Brad Langhorst
@bwlang
Apr 10 2018 17:51
@mes5k: wow -thank you! - i’ll experiment a bit with this and share my results.
Mike Smoot
@mes5k
Apr 10 2018 17:52
I'm procrastinating... :)