These are chat archives for nextflow-io/nextflow

23rd
Mar 2017
Roman Valls Guimera
@brainstorm
Mar 23 2017 08:21
Hello channel and @pditommaso!... Brainfart of the day for me: I'm trying to relocate work folder to a bigger scratch space both on a cluster and AWS: https://github.com/brainstorm/bwa-nextflow-benchmark/blob/master/nextflow.config#L20 and nextflow insists on creating it on the $CWD/$HOME blowing up quotas... what am I doing wrong this time? :_S
Wrong process.scratch variable? Should be sth like process.work instead?
Paolo Di Tommaso
@pditommaso
Mar 23 2017 08:45
work dir <> scratch dir
work dir must be a shared directory (when using a cluster) you can define with the CLI option -w or with workDir in the config file
Roman Valls Guimera
@brainstorm
Mar 23 2017 09:02
process.workDir then?
Paolo Di Tommaso
@pditommaso
Mar 23 2017 09:02
nope
only workDir
Roman Valls Guimera
@brainstorm
Mar 23 2017 09:04
Thanks!
Paolo Di Tommaso
@pditommaso
Mar 23 2017 09:04
welcome
Paolo Di Tommaso
@pditommaso
Mar 23 2017 09:30
@brainstorm Hey man, star and let other people star the project !
:D
Phil Ewels
@ewels
Mar 23 2017 09:46
:star2:
Paolo Di Tommaso
@pditommaso
Mar 23 2017 09:48
;)
Phil Ewels
@ewels
Mar 23 2017 10:56
Hi @pditommaso - I have a process where I want to count the number of input BAM files. I'm doing something like this (simplified a little):
process deepTools {
    input:
    file bam from bam_dedup_deepTools.flatten().toSortedList()
    [..]

    script:
    num_bams = bam.size()
    if(num_bams < 2){
It works fine if I do have 2 or more input files, but I get a big Java stack trace if there is only one
I suspect that my input channel is no longer a BlankSeparatedList, and so size() isn't working with it any more
Any suggestions how I should go about doing this instead?
The main java error is java.util.ConcurrentModificationException
If I switch the num_bams line to just num_bams = 1 then everything works as expected
Paolo Di Tommaso
@pditommaso
Mar 23 2017 10:59
let me check
Phil Ewels
@ewels
Mar 23 2017 11:02
Maybe I can check bam.getClass() before I try to count it?
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:03
the point is that when there's only one item the list is implicitly unwrapped
Phil Ewels
@ewels
Mar 23 2017 11:04
yeah, I guessed that's what was happening
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:04
so that size should be the file length
stated that's something that it should be improved because it's a bit tricky
what you can do is check the bam type, something like
Phil Ewels
@ewels
Mar 23 2017 11:05
if(bams instanceof UnixPath){ ?
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:05
script:
def single = bam instanceof Path 
..
just Path
Phil Ewels
@ewels
Mar 23 2017 11:06
ok great
I'm having a forehead-smack moment as I recognise that statement from another of our pipelines
so I'm sure that we have probably asked you this before ;)
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:06
yes .. :)
and I still did't implemented a better solution ;)
Phil Ewels
@ewels
Mar 23 2017 11:07
ok brilliant - that seems to work nicely.. Hopefully I'll remember the next time I come across the same task!
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:07
you are welcome
I think you should also be able to replace .flatten().toSortedList() with .collect() provided you are using the latest version
Phil Ewels
@ewels
Mar 23 2017 11:09
Nice, tidy!
Just managed to implement Docker + automated testing for the second of our pipelines :grin:
took a bit of work and refactoring, but was easier the second time around..
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:12
nice, is the repo public ?
Docker image still building on dockerhub though
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:14
great!
out of curiosity, are you moving on with Singularity ?
Phil Ewels
@ewels
Mar 23 2017 11:17
Nothing yet, still waiting for our cluster administrators to get back to us about it
(we're not holding our breaths though)
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:17
:smile:
sysadmins' power ;)
Phil Ewels
@ewels
Mar 23 2017 11:17
yup ;)
Paolo Di Tommaso
@pditommaso
Mar 23 2017 11:18
(leaving now)
Phil Ewels
@ewels
Mar 23 2017 11:18
To be honest, it's working nicely with parallel config setups anyway, so don't feel a big rush for that
:+1: Thanks for the help!
Matthieu Foll
@mfoll
Mar 23 2017 16:21

Hi @pditommaso,
I need help to build a channel emitting file paths from each line of a Csv file. The test Csv file looks like this:

tumor    normal
t1.bam    n1.bam
t2.bam    n2.bam

I have a process that needs to run for each line of the Csv file with 4 input files, for example for the first line: t1.bam and n1.bam + their index that I know are called t1.bam.bai and n1.bam.bai.

I tried this:

bam = Channel.fromPath(params.data).splitCsv(header: true, sep: '\t', strip: true).map{row -> [ row.tumor, row.tumor+'.bai' ,row.normal, row.normal+'.bai' ]}    
bam.println()

The output is what I expect:

[follm@hn ~]$ ~/nf_scripts/test.nf --data tn_pairs.txt 
N E X T F L O W  ~  version 0.24.0
Launching `/home/follm/nf_scripts/test.nf` [amazing_bartik] - revision: 54887762de
[t1.bam, t1.bam.bai, n1.bam, n1.bam.bai]
[t2.bam, t2.bam.bai, n2.bam, n2.bam.bai]

But when I try to use the bam channel as an input of a process declared as file like this one :

process run_pair {

  input:
  file pair from bam

  echo true

  shell:
  '''
  ls -l
  '''
}

I don’t understand the behaviour:

[41/1d2abb] Submitted process > run_pair (2)
[49/b57199] Submitted process > run_pair (1)
total 16
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.1 -> /home/follm/work/tmp/0e/4a989cbb975e0101822499ae725f19/input.1
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.2 -> /home/follm/work/tmp/08/da6a34e7c2a72b92e81a3387b4fa81/input.2
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.3 -> /home/follm/work/tmp/ed/39ac918bea6a9264c61ae0b78c72e8/input.3
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.4 -> /home/follm/work/tmp/ff/355ff2caed9fc7cb3f06f076a8f6e4/input.4
total 16
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.1 -> /home/follm/work/tmp/36/ec4e9c21cd5081e8b9ef1e44dafd0e/input.1
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.2 -> /home/follm/work/tmp/55/77943ec1a7aeae1be354eb492e12e7/input.2
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.3 -> /home/follm/work/tmp/f6/8289df814d45e4e26a345e8cf6c15b/input.3
lrwxrwxrwx 1 follm lsu 62 Mar 23 17:19 input.4 -> /home/follm/work/tmp/96/05bd16c177f3217394e026defcc9fb/input.4

Each time the 4 input files actually contain the names of my files, for example:

[follm@hn ~]$ cat /home/follm/work/tmp/0e/4a989cbb975e0101822499ae725f19/input.1
t2.bam

What’s wrong here?

Paolo Di Tommaso
@pditommaso
Mar 23 2017 16:29
That's because each tuple has 4 components, thus you will need to use as input a set with for files..
*four
Does it make sense?
Matthieu Foll
@mfoll
Mar 23 2017 16:33
hum, I have a few pipelines where I am not doing that and it’s working, like here for example: https://github.com/IARCbioinfo/conpair-nf/blob/master/conpair.nf#L91, see how this channel is used here: https://github.com/IARCbioinfo/conpair-nf/blob/master/conpair.nf#L100
What is different in this case?
Paolo Di Tommaso
@pditommaso
Mar 23 2017 16:37
Ok, I got it. You need to transform the path name string to a file object by file() function
tho I would strongly suggest
Matthieu Foll
@mfoll
Mar 23 2017 16:38
ok it’s working
and no need to use set as an input
Paolo Di Tommaso
@pditommaso
Mar 23 2017 16:39
tho it would work as long in your tuple you have all files, I strongly suggest to refactor your code to use a set when your items are tuples
Matthieu Foll
@mfoll
Mar 23 2017 16:39
bam = Channel.fromPath(params.data).splitCsv(header: true, sep: '\t', strip: true).map{row -> [ file(row.tumor), file(row.tumor+'.bai') ,file(row.normal), file(row.normal+'.bai') ]}
Paolo Di Tommaso
@pditommaso
Mar 23 2017 16:39
Yes
Matthieu Foll
@mfoll
Mar 23 2017 16:40
Ok thanks for the advice
David Trudgian
@dctrud
Mar 23 2017 17:02
Hi there. We have a question from a nextflow user on our cluster. When nextflow runs a process multiple times, it displays a sequential 'index' for the process invocation to the console. Is it possible to get hold of that number in a process? The user would like to add a sequential index to output filenames based on it.
Paolo Di Tommaso
@pditommaso
Mar 23 2017 17:03
yes, you can use task.index in the process context
David Trudgian
@dctrud
Mar 23 2017 17:03
i.e. the number 9 or 7 in:
[warm up] executor > slurm
[aa/17b28e] Submitted process > dowork (9)
[24/4aec6f] Submitted process > dowork (7)
Paolo Di Tommaso
@pditommaso
Mar 23 2017 17:03
yep, that is
David Trudgian
@dctrud
Mar 23 2017 17:04
Oh, great, thanks! I must have missed that in the docs. Sorry for the trivial question.
Paolo Di Tommaso
@pditommaso
Mar 23 2017 17:04
it may be that I've missed to document it .. :grin:
David Trudgian
@dctrud
Mar 23 2017 17:06
Thanks again for all your work on this. Cheers.
Paolo Di Tommaso
@pditommaso
Mar 23 2017 17:26
welcome