These are chat archives for nextflow-io/nextflow

13th
Nov 2017
Simone Baffelli
@baffelli
Nov 13 2017 09:28
How is it possible that inserting a .view() operator in a operator chain invalidates the cache?
I must be doing some mess, but I can't find how
Paolo Di Tommaso
@pditommaso
Nov 13 2017 09:29
it should not
Simone Baffelli
@baffelli
Nov 13 2017 09:30
this is why I worry
Paolo Di Tommaso
@pditommaso
Nov 13 2017 09:30
if so I can help only with a replicable test case
Simone Baffelli
@baffelli
Nov 13 2017 09:31
I know. Just tried to see wether someone ever experienced something similar
Paolo Di Tommaso
@pditommaso
Nov 13 2017 09:32
not to my knowledge
Simone Baffelli
@baffelli
Nov 13 2017 09:33
very weird
I added an operator similar to #517 and the cache is not invalidated anymore
even if I just add map{it}
I must be doing some awful mess
Tim Diels
@timdiels
Nov 13 2017 11:23

When using splitCSV how do I annotate each row with the file it came from? For example:

Channel
    .from(['file1', 'file2'])
    .magicSplitCSV()

Should output

Channel.from([
    ['file1', row1],
    ['file1', row2],
    ['file1', ...],
    ['file2', row1b],
    ['file2', row2b],
    ['file2', ...],
])
Paolo Di Tommaso
@pditommaso
Nov 13 2017 11:27
good point, I think it's not possible in the current form
it should be possibile with some custom code
Tim Diels
@timdiels
Nov 13 2017 11:33
I was thinking of these lines in pseudocode but I don't know where to start
while sourceChannel.notEmpty {
     file << sourceChannel
     file >> splitCSV
     while splitCSV {
          row << splitCSV
          [file, row] >> outputChannel
     }
}
Paolo Di Tommaso
@pditommaso
Nov 13 2017 11:34
this should work
Channel
    .fromPath('/your/csv/*')
    .flatMap{ file-> def rows=[]; 
        file.splitCsv(into:rows); 
        rows.each { it.add(0,file.name) }; 
        return rows 
    }
    .println()
in a nut shell,
  1. the splitCsv is applied directly to the file instead of channel
  2. the splits are redirected to the rows list
  3. iterate over the rows to add the file name as first column,
  4. return rows
  5. flatMap does the magic
Tim Diels
@timdiels
Nov 13 2017 11:46
Thanks. It's quite a different mindset from Python, yet still is imperative code. I see you've added some operator analogs (splitCsv) to Path as well.
Paolo Di Tommaso
@pditommaso
Nov 13 2017 11:49
yes, NF requires to switch to the dataflow/streaming paradigm
tho not documented splitXxx can be applied also to file objecs
Hugues Fontenelle
@huguesfontenelle
Nov 13 2017 13:47

Hi
What's wrong with:

vcf_input = Channel.from([
    file([analysis_path, analysis.params['vcf']].join(File.separator)),
    file([analysis_path, analysis.params['vcf.idx']].join(File.separator))
    ])

then

process anno {
    input:
    set file(vcf_file), file(idx_file) from vcf_input
}

?

I get

WARN: Input tuple does not match input set cardinality declared by process `anno` -- offending value: /Diag-NA12878.vcf

and fails anyway.

Paolo Di Tommaso
@pditommaso
Nov 13 2017 13:52
having Channel.from([a,b,c]) creates a channel emitting a, b,c one after another
either
Hugues Fontenelle
@huguesfontenelle
Nov 13 2017 13:52
I see. I want a single tuple instead, so perhaps double brackets?
Paolo Di Tommaso
@pditommaso
Nov 13 2017 13:52
Channel.from([ [a,b], [c,d], .. ] )
or
vcf_input = [ [a,b], [c,d], .. ].channel()
Hugues Fontenelle
@huguesfontenelle
Nov 13 2017 13:53
tuple = Channel.from( [1, 'alpha'], [2, 'beta'], [3, 'delta'] )

process setExample {
    input:
    set x, 'latin.txt' from tuple

    """
    echo Processing $x
    cat - latin.txt > copy
    """

}
(copy pasted from link above)
Paolo Di Tommaso
@pditommaso
Nov 13 2017 13:55
actually no
read from the note on
Hugues Fontenelle
@huguesfontenelle
Nov 13 2017 13:58
Yes, OK.
Thank you!
Paolo Di Tommaso
@pditommaso
Nov 13 2017 13:58
:+1:
Tim Diels
@timdiels
Nov 13 2017 18:22
Does output: file "$var*" into ch treat : specially when e.g. var='species:go:set'?

I get

Caused by:              
  Missing output file(s) `arath` expected by process `clime (arath:wg)`

where

tag "$geneSetName"
output:
    file "$geneSetName*" into climePublish

and geneSetName = 'arath:wg'

Tim Diels
@timdiels
Nov 13 2017 18:30
hmm indeed, this snippet reproduces it
process p {
    publishDir 'output'

    input:
    val weird from Channel.from(['anything'])

    output:
    file 'arath:wg' into publishDir

    script:
    """
    touch 'arath:wg'
    """
}
I'll work around it by replacing : with _
$ nextflow run .
Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on
N E X T F L O W  ~  version 0.26.0
Launching `./main.nf` [hopeful_wiles] - revision: f986b99960
[warm up] executor > local
[93/c94b71] Submitted process > p (1)
ERROR ~ Error executing process > 'p (1)'

Caused by:
  Missing output file(s) `arath` expected by process `p (1)`

Command executed:

  touch 'arath:wg'

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/limyreth/tmp/work/93/c94b715786e7753fb440878c4f03b8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
Paolo Di Tommaso
@pditommaso
Nov 13 2017 20:13
you have a : in the file name?!