These are chat archives for nextflow-io/nextflow

20th
Jul 2017
Tobias Neumann
@t-neumann
Jul 20 2017 11:31
is there already a feature available to import processes into a new workflow?
Phil Ewels
@ewels
Jul 20 2017 11:33
Not yet, no. But I believe that it's being actively worked on..
Paolo Di Tommaso
@pditommaso
Jul 20 2017 11:40
Unfortunately no
Tobias Neumann
@t-neumann
Jul 20 2017 11:49
oh hi @ewels - didn't know you where developing on nextflow as well
Paolo Di Tommaso
@pditommaso
Jul 20 2017 11:52
Ad honorem :)
Phil Ewels
@ewels
Jul 20 2017 11:53
I'm not really ;) More just an interested party..!
Tobias Neumann
@t-neumann
Jul 20 2017 11:54
usually boosting my confidence I'm using the right tools, though, if you're around :smile:
Shellfishgene
@Shellfishgene
Jul 20 2017 12:48
If I have publishDir with mode 'move' nf will run everything again even with the -resume switch on, is that right?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 12:50
yes
you have to take in account that if you want to use move
Shellfishgene
@Shellfishgene
Jul 20 2017 12:51
Ok, makes sense :)
Tobias Neumann
@t-neumann
Jul 20 2017 13:05

I'm trying to split bash commands over several lines with \ but there's some issue

script:
    """
    samtools view -hb ${reads[0]} chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 \ 
    chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY \
    | samtools sort -@ ${params.threads} -o ${name}_noContig.bam
ERROR ~ unexpected char: '\' @ line 46, column 86.
   hr5 chr6 chr7 chr8 chr9 chr10 \
                                 ^

what am I missing?

Paolo Di Tommaso
@pditommaso
Jul 20 2017 13:05
try \\
or maybe you have a blank after \
Tobias Neumann
@t-neumann
Jul 20 2017 13:06
ok it was that \ - thanks
\\ ^^
Shellfishgene
@Shellfishgene
Jul 20 2017 13:20
What does this mean exactly? WARN: No more task to compute -- Execution may be stalled (see the log file for details)
Tobias Neumann
@t-neumann
Jul 20 2017 13:21
I'm trying to replicate a channel similar to the fromFilePairs by supplying filePaths as yaml file
masterChannel = Channel.from(params.files).println()
[condition1, [condition1.bam, input.bam]]
[condition2, [condition2.bam, input.bam]]

when I try to use it as input

 input:
    set val(name), file(reads) from masterChannel

it however is not available in the reads list apparently

this

samtools view -hb ${reads[0]} chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 \
    chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY \
    | samtools sort -@ ${params.threads} -o ${name}_noContig.bam

    SWEMBL -F i ${name}_noContig.bam -r ${reads[1]} -o ${name}_SWEMBL_raw.bed

is substitute to this apparently

samtools view -hb input.1 chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10     chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY     | samtools sort -@ 10 -o condition1_noContig.bam

  SWEMBL -F i condition1_noContig.bam -r input.2 -o condition1_SWEMBL_raw.bed
Evan Floden
@evanfloden
Jul 20 2017 13:32
I think the prob is that condition1.bam is not a full path.
Tobias Neumann
@t-neumann
Jul 20 2017 13:34
i tried it with absolute paths as well - that's not the issue
Evan Floden
@evanfloden
Jul 20 2017 13:34
@Shellfishgene This was a warning that has been recently added.Not sure it helps but I had the same warning a while ago and Paolo said I can ignore it.
Shellfishgene
@Shellfishgene
Jul 20 2017 13:35
@skptic Yeah, it seems like everything worked fine.
Paolo Di Tommaso
@pditommaso
Jul 20 2017 13:49
You need to map the string path to file paths
Tobias Neumann
@t-neumann
Jul 20 2017 13:52
@pditommaso and how do i do that? in the Channel.from constructor?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 13:55
You need a map operator after from which translates the list of path strings to a list of files
Will send an example later
Tobias Neumann
@t-neumann
Jul 20 2017 14:01
ok cool - kind of stuck on that one
Kevin Sayers
@KevinSayers
Jul 20 2017 14:23
@pditommaso for the experimental support for kubernetes... has it ever been run on google cloud? or just AWS?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:26
Was Google, but I used a special envoy assembled by Univa
Envoy=env
(I'm on mobile)
Kevin Sayers
@KevinSayers
Jul 20 2017 14:28
ok
do you think there is a possibility it will run without that env?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:31
All the struggle is how to deploy kubernetes with a shared file system
KubeNow is good candidate
*a
Kevin Sayers
@KevinSayers
Jul 20 2017 14:33
ok
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:44
@t-neumann this should do the trick
Channel.from(params.files).map { condition, list -> list.collect{ file(it)} }
Tobias Neumann
@t-neumann
Jul 20 2017 14:48

it messes with the structure unfortunately

instead of

[condition1, [condition1.bam, input.bam]]

i know only have

[condition1.bam, input.bam]

so I use the name key of

set val(name), file(reads) from masterChannel
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:49
umm, before you posted
masterChannel = Channel.from(params.files).println()
[condition1, [condition1.bam, input.bam]]
[condition2, [condition2.bam, input.bam]]
so it looks a channel of tuples, in which the first entry is a condition and the second a list of files
Tobias Neumann
@t-neumann
Jul 20 2017 14:51
exactly! yet when I apply your .map condition, it apparently only returns a list of files
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:52
ah yes, it's my mistake
the map must return the new object, hence
    Channel
         .from(params.files)
         .map { condition, list -> 
            def files = list.collect{ file(it)} 
            return tuple(condition, files) 
        }
easy!
Kevin Sayers
@KevinSayers
Jul 20 2017 14:54
if there is no data (i.e just echoing hello) is the shared file system still necessary? I have specified a dockerhub container, the k8s executor, it states the job was submitted. Nothing happens though no errors.
Paolo Di Tommaso
@pditommaso
Jul 20 2017 14:55
umm, yes
because the NF control files need to be written in any case in a shared storage
here the interesting part is that Kubernetes has a lot of storage options
Tobias Neumann
@t-neumann
Jul 20 2017 15:01
@pditommaso thanks man!! it did the trick. I really should learn some groovy, though. Completely blind on that spot
Paolo Di Tommaso
@pditommaso
Jul 20 2017 15:02
it's a very fun programming lang, you should really do !
Tobias Neumann
@t-neumann
Jul 20 2017 15:02
yeah this channel handling and modifications are super powerful apparently
if I have two peak calling processes that each start from the same input file channel and I want to run them separately, but have another process that uses results of both once they're finished - would I use the .merge operator?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 15:05
it depends a process can have many channels as input
Tobias Neumann
@t-neumann
Jul 20 2017 15:06
so would it wait for an item from each channel before it executes?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 15:06
that's by definition
a process is triggered when all inputs are available
Tobias Neumann
@t-neumann
Jul 20 2017 15:08
is it guaranteed that the order of the items of the input channel are in the same order in the output channel?
Paolo Di Tommaso
@pditommaso
Jul 20 2017 15:09
yes
BUT you need to take in account that processes are executed in parallel, in other words the execution order is not deterministic
hence the order of the outputs of a process is not the same as the order of the inputs
Tobias Neumann
@t-neumann
Jul 20 2017 15:11
ok then it's probably better to wait until both processes have fully stopped before starting the process on the results from both previous processes
Shellfishgene
@Shellfishgene
Jul 20 2017 15:11
I need a hint: I have multiple fastq pairs for per sample, i.e. sample1.L1.R{1,2}.fastq, sample1.L2.R{1,2}.fastq, sample2.L1.R{1,2}.fastq, samle2.L2.R{1,2}.fastq. How do I split them by sample for a process that runs for each sample but needs all pairs for the sample? In this example two pairs.
Paolo Di Tommaso
@pditommaso
Jul 20 2017 15:19
(sorry boarding, if you scroll the history there should be an example one or two week ago)
Shellfishgene
@Shellfishgene
Jul 20 2017 15:19
Have a good flight :)
rdacemel
@rdacemel
Jul 20 2017 19:31

we were putting the intermediate files in wrong folders (home folder subdirectory) and that's not the way it should be

effectively it was that, I saw you mentioned somewhere that the work directory should be accessible to all nodes and in our case that was not the case. Now it is working smoothly!! Thanks a lot for the support and for the wonderful tool!

Sergey Venev
@sergpolly
Jul 20 2017 21:00
Hi,
Another interesting observation and couple of questions about inner workings of nextflow:
I was running nextflow pipeline on an LSF cluster and it was doing simple chunking of big fastq.gz files. Some major glitch in our cluster's interconnect happened and apparently not all of the results of the chunking-process were copied to intermediates (storeDir-folder), i.e. all chunks were present in the work/hash.../ folder and exitcode=0, but some chunks were missing from the storeDir-location. I had an errorStrategy='retry', and nextflow was able to relaunch zombie-jobs that failed because of the glitch. Jobs (processes) that were downstream of that ill-fated chunking managed to run ok on the incomplete input. At the end, the whole pipeline completed just fine.
A couple of questions in the context of this story:
(1) Why wouldn't a process that failed to copy files to storeDir just fail? Is it by design?
I checked .command.run file in the work/hash.../ folder and there was || true right next to the commands for copying the results from work to storeDir - isn't it the reason the process didn't fail?
(2) Does nextflow has means of checking if input data is complete or not based on the upstream processes? In my case jobs (mapping and postprocessing) that were downstream of the "incomplete"-chunking didn't fail or complaint to me about it.
Sergey Venev
@sergpolly
Jul 20 2017 21:05
In my case incomplete data were paired-ends fastq.gzfiles: all 31 chunks were present for read-1 side of the paired-end and only ~14 chunks were present for the read-2 side.
Downstream mapping simply ignored 31-14=17 chunks from side_1