These are chat archives for nextflow-io/nextflow

13th
Oct 2017
Brian Reichholf
@breichholf
Oct 13 2017 11:30
Hi there! I have a question regarding staging files: when I specify stageInMode or stageOutMode I can choose copy or move, but is there a way that I could provide a binary to copy with? Our IT is encouraging us to use dcp https://github.com/hpc/mpifileutils
Luca Cozzuto
@lucacozzuto
Oct 13 2017 11:31
hi all, a quick (and silly) question. Which is the best way to get the two files from the channel .fromFilePairs ?
Paolo Di Tommaso
@pditommaso
Oct 13 2017 11:45
@breichholf ohh, interesting it should be possible
open a feature request on GH please, you may also want to consider a pull request for that, I can't test dcp tool
@lucacozzuto not sure to understand
Luca Cozzuto
@lucacozzuto
Oct 13 2017 12:12
when you use the channel .fromFilePairs you obtain pair of files
and I would like to get them independently in a script like read1 and read2
Paolo Di Tommaso
@pditommaso
Oct 13 2017 12:32
I see use flat: true option
Luca Cozzuto
@lucacozzuto
Oct 13 2017 12:34
and how to call the two reads?
something like this?
${reads}[0]
Paolo Di Tommaso
@pditommaso
Oct 13 2017 12:35
wait
look here
then how it is used
Luca Cozzuto
@lucacozzuto
Oct 13 2017 12:40
wonderful. I strongly suggest to add it to the official documentation as a general case
(or maybe is there but I did not get it easily :) )
Paolo Di Tommaso
@pditommaso
Oct 13 2017 12:40
:+1:
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:04
Where can I retrieve the Dockerfile used to build this container ? https://hub.docker.com/r/nextflow/rnaseq-nf/
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:04
oops
yes
wait
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:06
any known issue in changing to Python 3 for that container ?
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:06
umm no
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:06
ok good
thanks
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:13
mmm multiqc 1.1 depends on Python 2.7
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:14
bingo !
why do you need python 3?
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:14
for another code to work in the pipeline, nevermind it’s easier to update my code to make it back compatible with Python 2.7
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:15
or to have two different containers showing the case of having two queues . .
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:15
yes, also a good idea
Luca Cozzuto
@lucacozzuto
Oct 13 2017 13:16
having more containers is a best idea when using different version of programs
actually is one of the reason I really like them
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:18
yes
agree
it’s just that on AWS Batch it’s slighlty a pain in the ass now between containers and AMI and Job Definitions ;)
but it will get better soon
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:26
Folks, when there are two input files with the same name, what should do NF
stop or report an error message?
nextflow-io/nextflow#470
(comment on the issue please)
cc @mes5k
Luca Cozzuto
@lucacozzuto
Oct 13 2017 13:35
the file name is central in nextflow, it should be a stop and not a warning
Paolo Di Tommaso
@pditommaso
Oct 13 2017 13:35
I tend to agree
Félix C. Morency
@fmorency
Oct 13 2017 13:39
+1
Francesco Strozzi
@fstrozzi
Oct 13 2017 13:39
yes
Félix C. Morency
@fmorency
Oct 13 2017 13:45
I had issue in the past because of this behaviour
Francesco Strozzi
@fstrozzi
Oct 13 2017 14:39

guys I don’t understand how to properly use splitCsv operator. I have a channel reading a CSV file and I want to get the individual values as input in a process

This is the input line in the process

set val(dbxref),val(sample_type),val(fastq_1),val(fastq_2) from encode_files.splitCsv()
Paolo Di Tommaso
@pditommaso
Oct 13 2017 14:40
main issue ?
Francesco Strozzi
@fstrozzi
Oct 13 2017 14:40
but I’m getting a warning
WARN: Input tuple does not match input set cardinality declared by process `getFiles` -- offending value: [SRA-SRR5210435, dendritic_cell, False, https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz, https://www.encodeproject.org/files/ENCFF658PGS/@@download/ENCFF658PGS.fastq.gz]
and if I do a simple echo on those values this is what I’m getting from the process
[SRA-SRR5210435, dendritic_cell, False, https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz, https://www.encodeproject.org/files/ENCFF658PGS/@@download/ENCFF658PGS.fastq.gz],null,null,null
so something is not working as I expect :)
where am I wrong ?
Paolo Di Tommaso
@pditommaso
Oct 13 2017 14:41
first problem you have 5 cols csv and 4 value tuple in the input def
no ?
then I will tell you the second .. :)
faint ?
Francesco Strozzi
@fstrozzi
Oct 13 2017 14:59
sorry wrong copy and paste
I have the right number in the input def
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:00
you should also declare reads as files
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:00
if you see the output I am getting it seems it is putting the set just in the first val and ignoring the rest
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:00
set val(dbxref),val(sample_type),file(fastq_1),file(fastq_2) from encode_files.splitCsv { c1, c2, c3, c4 -> [c1, c2, file(c3),file(c4)] }
if you see the output I am getting it seems it is putting the set just in the first val and ignoring the rest
?
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:01
I get null,null,null,null
[SRA-SRR5210435, dendritic_cell, False, https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz, https://www.encodeproject.org/files/ENCFF658PGS/@@download/ENCFF658PGS.fastq.gz],null,null,null
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:02
umm
what is printing that ?
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:03
that is what I am getting, I’m just testing this code since in the full pipeline it does not work as expected. So if I declare the inputs the way I’ve put, that is what I’m getting if I just echo the variables
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:05
please share the code somewhere
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:06
process getFiles {

    input:
    set val(dbxref),val(sample_type),val(paired_end),val(fastq_1),val(fastq_2) from encode_files.splitCsv()

    """
    echo $dbxref,$sample_type,$paired_end,$fastq_1,$fastq_2
    """

}
it’s super simple
I’m just checking if the splitCsv thing works and I’m keep getting the wrong assignments apparently, so I don’t know where is the error
the channel encode_files is a single line CSV

SRA-SRR5210435,dendritic_cell,False,https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz,https://www.encodeproject.org/files/ENCFF658PGS/@@download/ENCFF658PGS.fastq.gz
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:12
what is the output of encode_files.splitCsv().println(); return
put it before process getFiles
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:15
seems proper
[14/24ae23] Submitted process > parseEncode (1)
[SRA-SRR5210435, dendritic_cell, False, https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz, https://www.encodeproject.org/files/ENCFF658PGS/@@download/ENCFF658PGS.fastq.gz]
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:21
it’s a mistery :)
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:22
it seems it return the line altogether ..
I need to understand if it's a feature or a bug :D
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:22
that it’s not the a list ?
with the [ ]
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:22
checking
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:29
I nice Friday afternoon bug !
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:30
Yeah, sorry
:)
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:31
shit !
it seems splitCsv is returning by default an array of strings, not a List ...
hence it does not match the input declaration
anyhow put
from encode_files.splitCsv { c1, c2, c3, c4, c5 -> [ c1, c2, c3, file(c4), file(c5) ]}
it should work
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:37
mmm
process getFiles {

    input:
    set val(dbxref),val(sample_type),val(paired_end),file(fastq_1),file(fastq_2) from encode_files.splitCsv() { c1, c2, c3, c4, c5 -> [ c1, c2, c3, file(c4), file(c5) ]}

    """
    echo $dbxref,$sample_type,$paired_end,$fastq_1,$fastq_2
    """

}
ERROR ~ No signature of method: _nf_script_f4925e15$_run_closure2$_closure4.call() is applicable for argument types: ([Ljava.lang.String;) values: [[SRA-SRR5210435, dendritic_cell, False, https://www.encodeproject.org/files/ENCFF408HJK/@@download/ENCFF408HJK.fastq.gz, ...]]
Possible solutions: any(), any(), each(groovy.lang.Closure), any(groovy.lang.Closure), each(groovy.lang.Closure), any(groovy.lang.Closure)
I’m looking into the abyss of the JVM
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:43
my fault :(
 input: 
  set a,b,c,d,e from  Channel.from(line).splitCsv().map { row -> [row[0],row[1],row[2],file(row[3]),file(row[4])] }
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:46
fantastic! It’s working. Thanks!
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:47
:/
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:47
so basically we need a map to explicitly convert from array to list in order to be able to assign each value to the proper variable
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:47
need to fix this
Francesco Strozzi
@fstrozzi
Oct 13 2017 15:48
so set is a list in groovy land
Paolo Di Tommaso
@pditommaso
Oct 13 2017 15:49
no in the NF dystopic world
it was really a bad choice, I want to change at some point
Ghost
@ghost~598345d2d73408ce4f6ff925
Oct 13 2017 19:37
For a process like cutadapt, what's the best way to capture the stdout to later be used by other processes (like Multi QC)?

Write now my process has this output/script

    output:
    set sampleid, '*fq' into trimmed_reads
    stdout cutadapt_log
    script:
    template 'cutadapt.sh'

But it doesn't seem the Multi QC can detect anything in the cutadapt_log variable

Paolo Di Tommaso
@pditommaso
Oct 13 2017 19:41
simple capture the stdout in the cutadapt.sh and save it to a file
then declare that file as an output
Ghost
@ghost~598345d2d73408ce4f6ff925
Oct 13 2017 19:47
Yep, that worked. Good thinking, thanks :)
Paolo Di Tommaso
@pditommaso
Oct 13 2017 19:47
:+1:
Daniel E Cook
@danielecook
Oct 13 2017 20:51
Has anyone attempted to test nextflow pipelines using continuous integration on something like travis-ci?
Paolo Di Tommaso
@pditommaso
Oct 13 2017 22:03
oh, we do that all the time