These are chat archives for nextflow-io/nextflow

16th
Mar 2018
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:18
Hi! I think this question was already made but I cannot find a solution in the doc... I have a process that in most of the cases outputs a file. Few times not. And in that case the pipeline complains (also because this file is used for another step). How can I solve it?
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:19
what you are you expecting it should do ?
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:20
well in case the file is not created, the pipeline should continue and the step that needs that file should not be executed
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:20
output:
file 'x' optional true into y
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:21
is it in documentation already? :)
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:22
if I document everything, then where;s the fun ? :satisfied:
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:22
at certain point you should make the documentation a wiki so that we can help in this
:)
Maxime Garcia
@MaxUlysse
Mar 16 2018 09:23
:+1:
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:23
folks, wiki is so 2000 ..
if you want to contribute click here, edit and send a PR ;)
Maxime Garcia
@MaxUlysse
Mar 16 2018 09:26
I was just about to say, that the documentation should be somewhere on github, we just needed to makes PRs
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:26
there were so many good things in 2000 ;)
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:27
I was just about to say, that the documentation should be somewhere on github
it is !
there were so many good things in 2000
it looks a collection of "sfighe" ie. bad lucks
Maxime Garcia
@MaxUlysse
Mar 16 2018 09:29
@pditommaso The blog is on github, so of course the documentation is there too
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:30
yeah but you found it on wikipedia and not in gitHub :)))))
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:30
Maxime, it's the docs folder in the main NF repo
Maxime Garcia
@MaxUlysse
Mar 16 2018 09:30
Oh, that's so nice and fancy
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:31
yeah but you found it on wikipedia and not in gitHub
fair enough :satisfied:
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:33
ehm... the problem is still there
Missing output file(s) `*.r` expected by process
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:37
I'll try with a -self-update... maybe is something new you introduced recently?
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:40
no
try the example, then see what's different in your code
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:41
well... it worked :)
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:41
:+1:
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:42
so -self-update is often the solution
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:42
:)
Luca Cozzuto
@lucacozzuto
Mar 16 2018 09:42
and this optional is really really useful stuff. Good job. We are at 3 beers now (or 10 coffees :) )
Vladimir Kiselev
@wikiselev
Mar 16 2018 09:44
how to find a size of a channel?
.size() throws an error
Paolo Di Tommaso
@pditommaso
Mar 16 2018 09:45
.count()
take it consideration that's consume the channel ..
Vladimir Kiselev
@wikiselev
Mar 16 2018 09:46
Thanks! So the channel will disappear after I call this method?
Paolo Di Tommaso
@pditommaso
Mar 16 2018 10:51
Yes, usually it's not a good pattern to rely on the number of items emitted by a channel with NF
Tiffany Delhomme
@tdelhomme
Mar 16 2018 10:56
I agree, here for example we duplicate the channel to use one for counting, not so clean
chdem
@chdem
Mar 16 2018 13:14
Hello ! Is it possible to generate a DAG graph without running a script ? My script is quite difficult to launch (many files in input, many parameters), so It'll save me a lot of time. :smile:
Vladimir Kiselev
@wikiselev
Mar 16 2018 13:26
@pditommaso @tdelhomme thanks!
Paolo Di Tommaso
@pditommaso
Mar 16 2018 13:47
@chdem nope, and I would strongly advice to have a small dataset that allows you to run your pipeline
That would save you a lot of time, in the future
Maxime Garcia
@MaxUlysse
Mar 16 2018 14:49
Definitively
That way you can set up CI tests too
chdem
@chdem
Mar 16 2018 14:51
OK, this is good practice, I know, I was playing the lazy here... :(
Maxime Garcia
@MaxUlysse
Mar 16 2018 14:54
Using a workflow is lazy we could do everything by hand
I think you're just not lazy enough
Vladimir Kiselev
@wikiselev
Mar 16 2018 14:56

sorry, similar question to the one I had already before… I have this pipeline:

process pr1 {
    …
    output:
        set val(sample), file('*.cram') into cram_files
    …
}

def sample_ind = 1
cram_files_inds = cram_files
    .map{ [sample_ind++, it[0], it[1]] }
    .transpose()

process pr2 {
    input:
        set val(ind), val(sample), val(cram) from cram_files_inds
    ...
}

the problem is that the ind value is not properly catched by NF in pr2 process. It’s either all the same for all jobs or is just missing...

Paolo Di Tommaso
@pditommaso
Mar 16 2018 15:04
put a .view() after .transpose() to see what is producing and eventually paste it here
Luca Cozzuto
@lucacozzuto
Mar 16 2018 15:12
hi @pditommaso There are some processes that are completed that when you use -resume they keep re-running
without touching them (so I don't invalidate the cache)...
Netsanet Gebremedhin
@gnetsanet
Mar 16 2018 15:34

Hello! I have posted this question on the google group as well and Paolo has been graciously helping but I figured gitter is quicker :-)

I have the following process:

lanes = Channel.from(1..4)

process processA {

        input:
        val lane from lanes
        output:
        file("${workDir}/${lane}/hello_${lane}*.txt") into welcomeFiles

        script:
        """
    create_files.sh ${lane} ${workDir}
        """

}

Where create_files.sh is as follows

 #!/bin/bash

for i in {1..10};
do
mkdir -p ${2}/${1}
echo "Hello">${2}/${1}/hello_${1}_${i}.txt;
done

Can anyone tell me what naive things I am doing here? I am coming across errors such as ‘missing output file’

'File X is out of the scope of process working dir:'

Maxime Garcia
@MaxUlysse
Mar 16 2018 15:37
Is workdir defined somewhere?
I can see lane but no definition of workdir
that would explain why nothing match your output
Tiffany Delhomme
@tdelhomme
Mar 16 2018 15:54
maybe this corresponds to the metadata workDir, but should be accessed through$workflow.workDir
Maxime Garcia
@MaxUlysse
Mar 16 2018 16:02
@tdelhomme I think that would be a good explanation
Tiffany Delhomme
@tdelhomme
Mar 16 2018 16:02
:smile:
Luca Cozzuto
@lucacozzuto
Mar 16 2018 16:06
no help for my problem? :(
Netsanet Gebremedhin
@gnetsanet
Mar 16 2018 16:15
Thank you @tdelhomme and @MaxUlysse. I tried with ${workflow.workDir} and the issue still crops up.
Paolo Di Tommaso
@pditommaso
Mar 16 2018 16:16
The correct answer is do-not-use-absolute-path
Netsanet Gebremedhin
@gnetsanet
Mar 16 2018 16:18
Same @tdelhomme

@pditommaso, which line/absolute path are you referring to ? In the output directive when output files are 'collected'?

Changing this

file("${workDir}/${lane}/hello_${lane}*.txt") into welcomeFiles

to this

file("${lane}/hello_${lane}*.txt") into welcomeFiles

does not make a difference, at least that is what I am observing.

Kevin Sayers
@KevinSayers
Mar 16 2018 16:55
@gnetsanet your path in your shell script is also absolute.
Vladimir Kiselev
@wikiselev
Mar 16 2018 17:07

Hi @pditommaso , here is my script:

sample_list = Channel.fromPath('samples.txt')

process irods {
    input:
        val sample from sample_list.flatMap{ it.readLines() }
    output:
        set val(sample), file('*.cram') into cram_files
    """
    echo test > test1.cram
    echo test > test2.cram
    echo test > test3.cram
    """
}

def sample_ind = 0
cram_files_inds = cram_files
    .map{ [sample_ind++, it[0], it[1]] }
    .transpose()
    .view()

where sample.txt is just this:

sample1
sample2
sample3

The output is this:

WARN: Process `irods` is defined two or more times
[warm up] executor > local
[8e/344b33] Submitted process > irods (1)
[7c/4a3425] Submitted process > irods (3)
[24/64a1d6] Submitted process > irods (2)
[3, sample1, /Users/vk6/work/8e/344b3380829e52608c2969ef0cfa6e/test1.cram]
[3, sample1, /Users/vk6/work/8e/344b3380829e52608c2969ef0cfa6e/test2.cram]
[3, sample1, /Users/vk6/work/8e/344b3380829e52608c2969ef0cfa6e/test3.cram]
[2, sample2, /Users/vk6/work/24/64a1d68b5fa90145784bc7a19bfad6/test1.cram]
[2, sample2, /Users/vk6/work/24/64a1d68b5fa90145784bc7a19bfad6/test2.cram]
[2, sample2, /Users/vk6/work/24/64a1d68b5fa90145784bc7a19bfad6/test3.cram]
[0, sample3, /Users/vk6/work/7c/4a342503974a059de71665659d5786/test1.cram]
[0, sample3, /Users/vk6/work/7c/4a342503974a059de71665659d5786/test2.cram]
[0, sample3, /Users/vk6/work/7c/4a342503974a059de71665659d5786/test3.cram]

So, the first question is why the first elements of the resulting list are not consequent? So, I have 0,2,3 and not 0,1,2.

Also if I change the initial value of sample_ind to 3 instead of 0 I get this:
[3, sample1, /Users/vk6/work/95/f76d40f8c484d453a1606500b5b317/test1.cram]
[3, sample1, /Users/vk6/work/95/f76d40f8c484d453a1606500b5b317/test2.cram]
[3, sample1, /Users/vk6/work/95/f76d40f8c484d453a1606500b5b317/test3.cram]
[3, sample2, /Users/vk6/work/8e/a4d2e39c9208b9b5b092d17f3f2439/test1.cram]
[3, sample2, /Users/vk6/work/8e/a4d2e39c9208b9b5b092d17f3f2439/test2.cram]
[3, sample2, /Users/vk6/work/8e/a4d2e39c9208b9b5b092d17f3f2439/test3.cram]
[3, sample3, /Users/vk6/work/89/b6818a32087165b8643b9e0467cd14/test1.cram]
[3, sample3, /Users/vk6/work/89/b6818a32087165b8643b9e0467cd14/test2.cram]
[3, sample3, /Users/vk6/work/89/b6818a32087165b8643b9e0467cd14/test3.cram]
so I suppose I don’t understand how the iterator works in .map{ [sample_ind++, it[0], it[1]] }
sorry, don’t mean to bother you on Friday night :smile: , but when you have time
maybe it’s something very simple
Paolo Di Tommaso
@pditommaso
Mar 16 2018 17:30
but that's the semantic of transpose
I guess you want
[0, sample1, /Users/pditommaso/projects/nextflow/work/37/96f22538a27c47e0725f2cc8c6dad3/test1.cram]
[1, sample1, /Users/pditommaso/projects/nextflow/work/37/96f22538a27c47e0725f2cc8c6dad3/test2.cram]
[2, sample1, /Users/pditommaso/projects/nextflow/work/37/96f22538a27c47e0725f2cc8c6dad3/test3.cram]
[3, sample2, /Users/pditommaso/projects/nextflow/work/a8/5a61f94bed91f0657764a2a1e4d2ed/test1.cram]
[4, sample2, /Users/pditommaso/projects/nextflow/work/a8/5a61f94bed91f0657764a2a1e4d2ed/test2.cram]
[5, sample2, /Users/pditommaso/projects/nextflow/work/a8/5a61f94bed91f0657764a2a1e4d2ed/test3.cram]
[6, sample3, /Users/pditommaso/projects/nextflow/work/d4/a7d945ab4a80f72b2f179c3b746924/test1.cram]
[7, sample3, /Users/pditommaso/projects/nextflow/work/d4/a7d945ab4a80f72b2f179c3b746924/test2.cram]
[8, sample3, /Users/pditommaso/projects/nextflow/work/d4/a7d945ab4a80f72b2f179c3b746924/test3.cram]
you need to transpose before and then map ..
Netsanet Gebremedhin
@gnetsanet
Mar 16 2018 18:13

@KevinSayers , Thank you!

the path in the shell script is obtained from a command-argument. Once we are in the shell script, it would not matter if the provided path is absolute or not, does it?

 create_files.sh ${lane} ${workDir}

Let us assume create_files.sh is a third party tool whose code I cannot change and I provide a relative path to where I want it to write outputs to. What do you do in those kinds of situation?

Right now, I am creating additional files that I can use as process completion markers.

  process processA {

        input:
        val lane from lanes
        output:
        //file("$workflow.workDir/${lane}/hello_${lane}*.txt") into welcomeFiles
        file('process_complete.txt') into processCompletionMarker

        script:
        """
        /mnt/home/ngebremedhin/novaseq_pipe/create_files.sh ${lane} $workflow.workDir
        touch 'process_complete.txt'
        """

}
Paolo Di Tommaso
@pditommaso
Mar 16 2018 19:24
again, NF is not designed to work in this model, if for some reason you manage to have that code working it supposed to considered an anti-pattern and not supported
Let us assume create_files.sh is a third party tool whose code I cannot change and I provide a relative path
Paolo Di Tommaso
@pditommaso
Mar 16 2018 19:31
if you are in such situation you can still specify the current working directory using the variable $PWD as I replied in the google group, please do not cross post the same questions
Vladimir Kiselev
@wikiselev
Mar 16 2018 20:10
@pditommaso saved my Friday night, thanks again!
Paolo Di Tommaso
@pditommaso
Mar 16 2018 20:11
That's the reason I've replied :joy:
Vladimir Kiselev
@wikiselev
Mar 16 2018 20:17
:clap:
Netsanet Gebremedhin
@gnetsanet
Mar 16 2018 21:10
Thank you @pditommaso. Apologies for cross-posting.
Vladimir Kiselev
@wikiselev
Mar 16 2018 21:24
"going to bed on Friday night” message: I've just managed to find out about .groupTuple() in the documentation, which solved my problem. Without asking you! Looks like I am starting to understand the operators ;-) Hope you karma feels good, sleep tight! :fire:
Vladimir Kiselev
@wikiselev
Mar 16 2018 21:37
and obviously nextflow console is indispensable!
Paolo Di Tommaso
@pditommaso
Mar 16 2018 22:21
:+1: