These are chat archives for nextflow-io/nextflow

7th
Jun 2016
Rickard Hammarén
@Hammarn
Jun 07 2016 12:35

Hi! I'm having an issue with the latest version of Nextflow: in 19.4 i get the following error:

ERROR ~ Channel `*Log.final.out` has been used twice as an output by process `star` and process `trim_galore`

 -- Check script 'main.nf' at line: 213 or see '.nextflow.log' file for more details

But it runs just fine if I roll back to 18.3.

Paolo Di Tommaso
@pditommaso
Jun 07 2016 12:35
unfortunately, that means there's something wrong in your script
likely there's a declaration like the following:
process foo {
  output: 
  file '..' into channel_x
  file '..' into channel_x

'''
your script 
'''
}
i.e. a repetition of the same output channel, this syntax is not supported by nextflow
wait but your case it should be different, because the error message is saying that the same channel is used by two different processes
is your script accessible somewhere?
Rickard Hammarén
@Hammarn
Jun 07 2016 12:40
So I have to declare new names for all the different output files? even if they arn't actually being used again? I.e I have a lot of stuff just going into results is that not possible anymore?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 12:43
it was never been possible, the problem is that the error was not reported
Rickard Hammarén
@Hammarn
Jun 07 2016 12:43
haha ok
Paolo Di Tommaso
@pditommaso
Jun 07 2016 12:44
but if you had used the results channel it would had result in an error
however in you case, you only need to remove that ... into results
Rickard Hammarén
@Hammarn
Jun 07 2016 12:45
So just declare the output files but not send them into a channel?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 12:45
exactly, because the publishDir will take care of copying the results
even easier, isn't it?
Rickard Hammarén
@Hammarn
Jun 07 2016 12:46
Great, that makes sense. Thanks!
Paolo Di Tommaso
@pditommaso
Jun 07 2016 12:46
welcome
Rickard Hammarén
@Hammarn
Jun 07 2016 13:35

So I ran into another problem. This time with using modules. My runs instantly crash. If found the following:

.command.env: line 3: $2: unbound variable
.command.run: line 35: COUT: unbound variable

The '.command.env' file looks different than in previous runs:

nxf_module_load(){
  local mod=$1
  local ver=$2
  local new_module="$mod/$ver"
  if [[ ! $(module list 2>&1 | grep -o "$new_module") ]]; then
    old_module=$(module list 2>&1 | grep -Eo "$mod\/[^\( \n]+" || true)
    if [[ $old_module ]]; then
      module switch $old_module $new_module
    else
      module load $new_module
    fi
  fi
}

nxf_module_load bioinfo-tools
nxf_module_load FastQC

instead of just as previsously:

module load bioinfo-tools
module load FastQC
Evan Floden
@evanfloden
Jun 07 2016 13:35

I have a situation where when running with a docker an executable in the PATH is not being found. The NF error is:

work/27/ef6aae8695f18f7577804cf496c165/.command.sh: line 3: cufflinks: command not found

However I can login to the docker image with docker run -ti <image> and run cufflinks and it works fine. I have other files I am adding to PATH and they seem to be working fine. Has anyone seen this before?

Paolo Di Tommaso
@pditommaso
Jun 07 2016 13:39
@Hammarn It looks it is missing the version of the module you need to load
not sure but it is not mandatory ?
@skptic weird you need to debug that task, try to look the .command.env if contains valid variable definitions
Paolo Di Tommaso
@pditommaso
Jun 07 2016 13:57
@Hammarn I mean it should not be bioinfo-tools/x.y.z ?
Evan Floden
@evanfloden
Jun 07 2016 14:50
@pditommaso my mistake before. Cheers
Rickard Hammarén
@Hammarn
Jun 07 2016 15:05
@pditommaso bioinfo-tools doesn't have a version number - it's not a program. It just a module that loads all the available bioinformatics related modules into list of available modules
Michael L Heuer
@heuermh
Jun 07 2016 15:08
@thejmazz @Hammarn thinking aloud here on the "forking of channel" issue . . . rather then separate named output channels, e.g. into referenceGenomesZipped1, referenceGenomesZipped2, referenceGenomesZipped3 perhaps write into one output channel, then copy that into a new one with n copies, then use that for reading by n downstream processes?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 15:50
@Hammarn I see. I will prepare a fix soon
Paolo Di Tommaso
@pditommaso
Jun 07 2016 16:46
@Hammarn I've uploaded a new snapshot fixing the problem. If you want to give a try define this variable in your env export NXF_VER=0.20.0-SNAPSHOT, then run nextflow as usual
@heuermh @thejmazz Good point. Tx
Mike Smoot
@mes5k
Jun 07 2016 19:50
I've got a process that generates two different output files, which I'd like to go into different output directories (via publishDir). One thought was to have two publishDir directives, but that doesn't work (only the last one is used). I know that I could create a couple dummy processes that just publish the files, but I'm wondering if there's a more elegant approach?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:02
there was an idea to have a dynamically evaluated target file name, but has not been implemented ..
the alternative is to manage the outputs as a plain channel
Mike Smoot
@mes5k
Jun 07 2016 20:04
How would I publish from a plain channel?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:04
I mean
process foo {
  output: 
  file 'x' into results 
:
}

results.subscribe { it.copyTo('/some/where') }
surely more verbose
Mike Smoot
@mes5k
Jun 07 2016 20:06
I don't think that's more verbose than creating an extra process just to take advantage of publishDir!
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:07
of course not
Mike Smoot
@mes5k
Jun 07 2016 20:07
What do you think of the idea of allowing multiple publishDir directives? I can't tell if it's a good idea overall or whether it's just a solution that fits my use case.
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:09
I think it would be more useful to have the ability to specify target path dynamically using a closure
actually I think I've found the branch where I was trying an implementation for that
Mike Smoot
@mes5k
Jun 07 2016 20:11
Ok, sounds good. In any case I'm able to do what - I want I was just wondering. Thanks!
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:11
:+1:
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:25
basically I've that code, I've not merged it because I was not happy with the syntax
what about?
publishDir saveAs: { it == 'foo' ? "/some/path/foo" : "/some/where/else"  }
returning null the file is skipped
Mike Smoot
@mes5k
Jun 07 2016 20:29
So saveAs iterates over the list of outputs generated by the process?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:29
yep, it is invoked for each output file
Mike Smoot
@mes5k
Jun 07 2016 20:30
That would work in my case.
Gotta run to a meeting...
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:30
sure
Hugues Fontenelle
@huguesfontenelle
Jun 07 2016 20:59
At https://github.com/nextflow-io/nextflow/issues/172#issuecomment-224343380 do you mean 0.20.0-SNAPSHOT instead of v0.20.0?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 20:59
oops, yes sorry
updated
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:00
I get java.lang.OutOfMemoryError: Java heap space when using a stdout/stdin channel pair for bwa mem | samtools view. as well both processes use their own container. probably an issue with my mac's ram?
can give jvm more memory?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:02
yes, but definitely not a good idea. Much better redirect bwa mem | etc to a file and use that one
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:02
sounds good. or I can make a custom container that has bwa and samtools
Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:02
stdout is meant to capture small a output
I can make a custom container that has bwa and samtools
simpler is better
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:20

you find it simpler to have a polyglot container?

also, in general, are commands like bwa | samtools slower when writing to files instead? perhaps tool specific?

Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:20
well no, I find it simpler just redirect to a file
what do you mean by polyglot container ?
also, in general, are commands like bwa | samtools slower when writing to files instead? perhaps tool specific?
no docker has no impact on tools performance
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:31

polyglot as in a ton of tools stuffed into one container. i.e. everything needed for the whole pipeline. versus a unique container for each process

I didn't mean with respect to Docker, just in general

tool1 data1 | tool2 -o data2

instead of

tool1 data1 > tmp; tool2 -i tmp -o data2

my guess is it is negligible, the first will split cpi, just wondering if someone knows

Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:32
I see, I tend to prefer a single container approach because it's easier to maintain even if the image can become big
regarding the second question, the pipe version is surely better and faster
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:35
b/c of overhead from reading and writing files?
Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:35
because in the first case the two tools run in parallel also you don't need to write all the file to disk and read it again
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:35
hm
Paolo Di Tommaso
@pditommaso
Jun 07 2016 21:37
I guess you are wondering why I've suggest to save to bwa output to a file if so :)
Julian Mazzitelli
@thejmazz
Jun 07 2016 21:39
I am just at a crossroads now aha, I felt that piping is more performant, but I also want to try to use one container for each image. with unique containers I cant pipe, and stdout is too big. I prefer that b/c you can "plug and play" with existing containers (biodocker is pretty good) w/o having to create, push, and maintain your own. there are also advantages to the one dockerfile too though :)