These are chat archives for nextflow-io/nextflow

21st
Mar 2018
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:24
i use nextflow on a sge and this is a scratch i use the -w option for the scratch, but there is an error" Caused by:
Underlying input stream returned zero bytes" it's random and also there is a problem with access because nextflow don't create any workdir directory, any ideas to help me? nextflow uptdate? system compatibility? many thanks in advance
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:26
i use the -w option for the scratch
is that a shared storage accessible by all cluster nodes?
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:29
yes , it is
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:30
how is the complete error message ?
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:30

ERROR ~ Error executing process > 'recupsrr (41)'

Caused by:
Underlying input stream returned zero bytes

-- Check '.nextflow.log' file for details

Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:31
too little I need the complete log file, can you upload somewhere ?
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:33
is there an upload button or i can give you a link?
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:34
A link please
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:42
if you look at that file you will find this

java.io.IOException: Underlying input stream returned zero bytes
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:288)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at org.codehaus.groovy.runtime.IOGroovyMethods.eachLine(IOGroovyMethods.java:483)
    at org.codehaus.groovy.runtime.IOGroovyMethods.eachLine(IOGroovyMethods.java:456)
for error crash when echoing the command output
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:45
so you think it is my output command which is not working? am i right?
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:46
I don't know if it's corrupted the encoding of your command output or it's a NF bug
if you disable the process echo it should work
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:48
i will try and i will see with IT guys too, many thanks. by the way nextflow is very useful
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:48
thank you
how is the command you are executing ?
jeremygnfsk
@jeremygnfsk
Mar 21 2018 13:50
./nextflow ropipe/src/pipnetoyage.nf -c ropipe/src/pipenet.config -w /scratch/jganofsk/
Paolo Di Tommaso
@pditommaso
Mar 21 2018 13:52
I was meaning the task command, but no problem
to help to debug this you can do this
make a tar or or a zip file of the complete content of the directory
/scratch/jganofsk/a4/8efc22ee8e5faf905c58e81e5a26df
then upload it as before
jeremygnfsk
@jeremygnfsk
Mar 21 2018 14:02
i'm downloading it but it's only two fasta file paired end in this directory
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:02
well, the data I'm not interested, I need to check all the .command.* files
jeremygnfsk
@jeremygnfsk
Mar 21 2018 14:03
ok sorry i do it
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:04
no problem you are welcome
Tim Diels
@timdiels
Mar 21 2018 14:09
In the following, does process1 use the first created channel and process2 the second? Or do both process 1 and 2 (try to) use the second created channel?
inputChannel = Channel.create()
process process1 {
  input:
  val x from inputChannel
}
inputChannel = Channel.create()
process process2 {
  input:
  val x from inputChannel
}
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:15
@timdiels I'll say first option might be the one, but I'll also say it'll probably produce a warning for a colision
Tim Diels
@timdiels
Mar 21 2018 14:16
I just tried it, turns out it's the first option indeed
no warnings, version 0.26.0
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:17
IMHO you should have different channels for different processes
Tim Diels
@timdiels
Mar 21 2018 14:22
probably, I recall I once had a case where 2 processes did end up using the same Channel instance, but I'm not that adept at Groovy closures
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:23
@jeremygnfsk not sure to understand what's the problem, however the error message is this
srr
ERR1817270


Read 11579478 spots for ERR1817270
Written 11579478 spots for ERR1817270
2018-03-21T13:29:39 fastq-dump.2.8.1 err: libs/kfs/unix/sysfile.c:188:KSysFileDestroy_v1:  unknown while destroying file within file system module - unknown system error 'Resource temporarily unavailable(11)'
r730gpgpu5
@timdiels tricky example, in principle is valid because they are two different channel (instances)
but it hangs because Channel.create() produces anythting
Tim Diels
@timdiels
Mar 21 2018 14:25
@pditommaso
I ended up testing with
inputChannel = Channel.from(1,2)
process p1 {
    tag "$x"

    input:
    val x from inputChannel

    """
    echo $x
    """
}
inputChannel = Channel.from(3,4)
process p2 {
    tag "$x"

    input:
    val x from inputChannel

    """
    echo $x
    """
}
this outputs
[c4/44f891] Submitted process > p2 (4)
[0f/efe7e5] Submitted process > p1 (2)
[00/b15e90] Submitted process > p1 (1)
[ca/3a5afe] Submitted process > p2 (3)
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:25
yes make sense
Tim Diels
@timdiels
Mar 21 2018 14:25
but I'm not sure whether that's coincidence
jeremygnfsk
@jeremygnfsk
Mar 21 2018 14:25
thank you, i will see with with IT guys
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:26
@timdiels that fine, you are using the same variable name for two channels
Tim Diels
@timdiels
Mar 21 2018 14:27
E.g. not sure how process works behind the scenes but if it's like calling the closure
x = 1
f = { println x }
f()  // prints 1
x = 2
f()  // prints 2
then I'd expect it to depend on when it happens to call the closure
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:28
well, I want to suggest to use this patter, but it's safe
because the channel reference is resolved sequentially ie. as the processes are executed
Tim Diels
@timdiels
Mar 21 2018 14:29
and by executed you mean when they can start or when I define them?
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:30
yes defined is a better term here
Tim Diels
@timdiels
Mar 21 2018 14:32
ah good, and nowadays I prefix my channel variables to avoid reuse so no worries :)
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:33
@timdiels Which prefix are you using?
Tim Diels
@timdiels
Mar 21 2018 14:35
@MaxUlysse First a prefix for the section of the pipeline, e.g. of for OrthoFinder, b2g for Blast2GO. Then for things such as an input set channel I use the whole process name as prefix. E.g. for b2gBlastRefSeqInput is the input channel of b2gBlastRefSeq process.
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:37
Make sense, I'm currently using suffix for my channels like bamForMuTect1, but trying to find a shorter and clearer convention
Tim Diels
@timdiels
Mar 21 2018 14:40
@pditommaso @MaxUlysse Would you recommend against reuse of output channel variable names like this as well?:
downloads
    .map { [name: it[0], file: it[1]] }
    .set { downloads }
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:40
we begin discussing such issues here nf-core/nf-core.github.io#9 if you like to comment as well
Paolo Di Tommaso
@pditommaso
Mar 21 2018 14:42
I don't have a precise idea, maybe it's just too smart for me ;)
Maxime Garcia
@MaxUlysse
Mar 21 2018 14:43
I think it's worth looking into
Mike Smoot
@mes5k
Mar 21 2018 16:21
@pditommaso is there a unique value associated with each run of a process available for use as a variable within the process script? Maybe the hash value calculated for caching the process or even just a simple count? If so, I'd like to use it to generate a uniquely named directory that I can use in a later process where I aggregate results.
Paolo Di Tommaso
@pditommaso
Mar 21 2018 16:22
is there a unique value associated with each run of a process available for use as a variable within the process script?
task.index
the second part is a bit more scaring . . :)
not sure to understand what's your need
Mike Smoot
@mes5k
Mar 21 2018 16:28
Wellll, it's all about working around slurm limitations and too many output files generated. Basically I have a process that outputs a bunch of gff files. If I were to just put those files as a list into a channel, e.g. file("*.gff") then there's the possibility that there'd be so many files that the downstream .command.run would be so large that slurm would balk. So instead I put the files into a directory and put file("outdir") into the channel. My downstream process can't have duplicate inputs all named outdir, so what I'd like to do is use file("outdir.${task.index}") instead. Make sense?
There's probably a better way around this problem.
Paolo Di Tommaso
@pditommaso
Mar 21 2018 16:30
wait, too many outputs for a single task ?
Mike Smoot
@mes5k
Mar 21 2018 17:17
(sorry, had a quick meeting) Yes, when a task generates too many output files downstream processes can have a hard time dealing with them because the generated .command.run scripts and occasionally .command.sh scripts get so large (20+ MB) that slurm refuses to handle them. This isn't a problem with nextflow, but with slurm. I work around this by generating directories of output and other times json or yaml file manifests that are then passed around.
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:18
ohhh
if so the best thing is to create those outputs into a dir and pass that dir instead, no ?
Mike Smoot
@mes5k
Mar 21 2018 17:23
Right and that's exactly what I'm doing. However, my downstream process needs to combine the outputs of all those directories. I can't have 20 directories all named outdir symlinked into the same process work dir. Hence the need for a unique id for the dirname.
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:25
well, if you declare input file name patter NF gives each of them a unique name
Mike Smoot
@mes5k
Mar 21 2018 17:26
Can you point me to an example?
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:27
input: file('outdir_*') from dir_ch.collect()
Mike Smoot
@mes5k
Mar 21 2018 17:29
Cool, I'll experiment with that to see what happens. Thanks!
Edgar
@edgano
Mar 21 2018 17:45
hi!
Do I need to enable something to generate a file on the onComplete I want to call a sh on the onComplete, and it will manage the results and produce another files
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:47
why don't you just use a process ?
Edgar
@edgano
Mar 21 2018 17:48
bc I need the trace, and all the process has to be finished
have*
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:49
mmm, make bash wrapper that run NF and then complete the execution with your stuff
Edgar
@edgano
Mar 21 2018 17:49
I see.... i will give a try! thanks paolo!
Paolo Di Tommaso
@pditommaso
Mar 21 2018 17:49
:+1:
Vladimir Kiselev
@wikiselev
Mar 21 2018 21:15
Hi @pditommaso ! Is there a way of not sending a job output to the output channel if some condition is not met? So, the outputs of some jobs will be sent to the output channel, and the others will not.
My output is defined like this:
    output: 
        set val(sample), file('*.cram') into cram_files
and in some cases there will be no cram files produced
because the sampe is empty
Mike Smoot
@mes5k
Mar 21 2018 21:18
you can add optional true like this:
output: 
        set val(sample), file('*.cram') optional true into cram_files
Vladimir Kiselev
@wikiselev
Mar 21 2018 21:21
oh, interesting, is it a groovy command? I can’t find it in the NF documentation. Many thanks, will try it now
Vladimir Kiselev
@wikiselev
Mar 21 2018 21:28
cool, it works! Thanks a lot! @mes5k
Mike Smoot
@mes5k
Mar 21 2018 21:32
no problem. While I'm sitting here waiting for my pipeline to finish I'm going to create pull request to document this feature.
Phil Ewels
@ewels
Mar 21 2018 22:24
I've been thinking of trying to do more PRs for missing docs too :+1:
eg. -with-dag is one that springs to mind that has a config scope