These are chat archives for nextflow-io/nextflow

3rd
Aug 2018
Thomas Zichner
@zichner
Aug 03 2018 07:37
Good morning!
Does anybody know of a simple way to rename the standard profile?
I have the three profiles: local, cluster, and aws. Since the local profile should be used in case the user does not specify a profile, I generated another profile named standard using the same content as local. Is there a good way to avoid this code duplication?
Thank you very much!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 07:38
Does anybody know of a simple way to rename the standard profile?
what do you mean to rename ?
you mean to extend ?
Thomas Zichner
@zichner
Aug 03 2018 07:39
I mean to specify which profile to use in case the user does not select one
Paolo Di Tommaso
@pditommaso
Aug 03 2018 07:39
ah, no
Thomas Zichner
@zichner
Aug 03 2018 07:41

Could something like

profiles {
    standard, local {
        ...
    }
    cluster {
        ...
    }
}

work?

Or

profiles {
    standard {
        profile = 'local'
    }
    local {
        ...
    }
}

?

Paolo Di Tommaso
@pditommaso
Aug 03 2018 07:42
you can put the the snippet you want to reuse in a separate file and include it in different profiles
Thomas Zichner
@zichner
Aug 03 2018 07:44
Ok. Thank you very much!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 07:45
:ok_hand:
Raoul J.P. Bonnal
@helios
Aug 03 2018 08:21
Do you know if is it possible to use nextflow in an interactive way aka repl to test commands, objects, etc ? using nextflow console i get No X11 DISPLAY variable was set, but this program performed an operation which requires it. but I can not open an X11
Paolo Di Tommaso
@pditommaso
Aug 03 2018 08:23
without X11 you cannot run it remotely, there's no graphical screen ..
run it in your laptop/workstation
Phil Ewels
@ewels
Aug 03 2018 08:26
@zichner - we always have a config file called base.config which we import into every config profile
Lorenz Gerber
@lorenzgerber
Aug 03 2018 08:26
@helios if you interactively want to test commands/objects you can start nextflow in debug mode and attach jdb
Raoul J.P. Bonnal
@helios
Aug 03 2018 08:29
@pditommaso ok, @lorenzgerber I will try thanks
Vladimir Kiselev
@wikiselev
Aug 03 2018 09:51
Oh man, we’ve spent ~4 hours to record information about the bug we found yesterday, I almost pressed Submit new issue button on GitHub, but then we decided to update NF to the latest version and test again… All worked… Oh man, I need to put a remind somewhere to update NF before starting filing an issue… Thanks, Paolo!
Can I still submit it and close for the record? The issue description looks so nice!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 09:52
eventually, at this point I'm curious :smile:
Vladimir Kiselev
@wikiselev
Aug 03 2018 09:52
ok, will do now )
Paolo Di Tommaso
@pditommaso
Aug 03 2018 09:54
bets bug ever :wink:
Vladimir Kiselev
@wikiselev
Aug 03 2018 09:55
Thanks @micans for working on it and collecting information!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 09:55
thanks!
micans
@micans
Aug 03 2018 09:57
Hehe, my pleasure. Most importantly I triggered the upgrade, I had a feeling it might make our week!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 09:58
:ok_hand:
Phil Ewels
@ewels
Aug 03 2018 10:32
:laughing:
Alexander Peltzer
@apeltzer
Aug 03 2018 13:00

Quick question for -params-file parameter. The thread here wasn't as informative as I wanted it to be:
nextflow-io/nextflow#208
I want to use a pipeline on AWS and specify the reads accordingly in a map like structure (as the one created by Channel.fromFilePairs... method.

Attempting it like this:

readPaths: 
    - Mus1
      - s3://test-rnaseq-mouse-testrun/RAW/Mus1_1.fastq.gz 
      - s3://test-rnaseq-mouse-testrun/RAW/Mus1_2.fastq.gz
    - Mus2
     - s3://test-rnaseq-mouse-testrun/RAW/Mus2_1.fastq.gz
     - s3://test-rnaseq-mouse-testrun/RAW/Mus2_2.fastq.gz

didn't work. Feature is undocumented (yet!), but I'll send a PR once I figure out how to do this...

Paolo Di Tommaso
@pditommaso
Aug 03 2018 13:16
I'm waiting a willing user that want to document all CLI commands :smile:
regarding the specific point, you don't need Channel.fromFilePairs if you are defining the params with a yaml
they are already structured as expected
the problem however is that's still required to convert the string paths to a file objects
Alexander Peltzer
@apeltzer
Aug 03 2018 13:19
so if I specify this as -params-file params.yamlthis should suffice I guess? https://github.com/nf-core/rnaseq/blob/b32a25d08ee89891a508ada816d37135dafeb7f4/main.nf#L191
Paolo Di Tommaso
@pditommaso
Aug 03 2018 13:21
that or something like that, I mean I don't know if the code matches the data structure
Vladimir Kiselev
@wikiselev
Aug 03 2018 14:47
ok, this time we are at least running the latest version of NF ;-)
nextflow-io/nextflow#824
looks like it’s related to this one, sorry if I duplicated it: nextflow-io/nextflow#773
Paolo Di Tommaso
@pditommaso
Aug 03 2018 14:51
k8s is not so smart .. :/
Vladimir Kiselev
@wikiselev
Aug 03 2018 15:07
If it’s too complicated on the k8s side, @micans has one idea. If the user can define the maximum resources in the k8s config file (the user will know the size of the cluster), then NF can calculate how much it is using at the moment and whether it can schedule one more. If yes, then it schedules, if not, then it waits until at least one job is finished and then perform the calculation again. Does it sound reasonable?
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:08
yeah, but it should be managed by cluster, like any other scheduler
for now I would suggest to number of jobs with queueSize setting
Vladimir Kiselev
@wikiselev
Aug 03 2018 15:11
Ok, I see, thanks! Will be happy to help with/test the more principled solution.
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:13
I need to review the resource allocation/management in k8s for the issue mentioned and others
tho, not sure I will be able in the short term
Timur Shtatland
@tshtatland
Aug 03 2018 15:26

How can I execute a script for all pairwise combinations of elements from 2 channels? See the toy example below (for actuall example, the channels are not simple lists and are the outputs of other processes). I expected, based on the docs, that the "task is executed for each pair ... that are received. ". In this case, I expected 2 * 3 = 6 output files. I got only 2, see below. Changing the order of the channels in the input, of course, has no effect. Apparently for multiple input channels with different numbers of elements, the process is executed for all elements in the smallest channel, and the rest of the elements in the larger channels are silently skipped...

I am using NF 0.31.0.4885

#!/usr/bin/env nextflow

in1 = Channel.from '1', '2'
in2 = Channel.from 'a', 'b', 'c'


process find {

    input:
        // val val1 from in1
        // val val2 from in2

    val val2 from in2
    val val1 from in1

    output:
        file 'result.txt' into outFiles

    script:
        """
  echo $val1 $val2 > result.txt
  """

}

outFiles.subscribe {
    println "${ it } => ${ it.text }"
}

Output:

/Users/shtatland/test/work/d6/989cb3ac959e14ae129f16e5e8e7c1/result.txt => 1 a
/Users/shtatland/test/work/ed/bced098c512cdbba893a17bd2c11cb/result.txt => 2 b
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:27
you don't need channels to do that
use 2 input repeaters
(unless that values are coming from other processes)
Timur Shtatland
@tshtatland
Aug 03 2018 15:29
They would be coming from other processes usually - what should I do then? Sorry, this was just a self-contained tiny toy example.
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:30
use combine to combine as a single channel holding that values
Timur Shtatland
@tshtatland
Aug 03 2018 15:43
Thank you, this will work for my purpose! Now I am wondering why would one use multiple input channels as shown in the toy example? It seems like an unsafe usage, because (a) the unused elements from the larger channel are silently skipped, and (b) the skipped elements are apparently random, when I test the code with the large list of files like so: in1 = Channel.fromPath "fasta/*.fa"
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:47
they are not skipped, it's the semantic that should be documented better
in a nutshell, a process a process wait until there's a complete input configuration form all input channels
when this condition is verified consume a element from each input channel and process the task, and so on
until one or more channel have no more content (ie. hidden termination element aka poison pill)
therefore if want to use two or more input channels you have to make sure they will produce the same numbers of elements
Paolo Di Tommaso
@pditommaso
Aug 03 2018 15:54
HOWEVER, if you use a value channel aka singleton channel together with a queue channel, you will see that the first (even providing a single value) wont' stop the execution
to understand compare
process foo {
  echo true
  input: 
  val x from Channel.from(1,2)
  val y from Channel.from('a','b','c')
  script:
   """
   echo $x and $y
   """
}
and
process foo {
  echo true
  input: 
  val x from Channel.value(1)
  val y from Channel.from('a','b','c')
  script:
   """
   echo $x and $y
   """
}
Timur Shtatland
@tshtatland
Aug 03 2018 16:06
OK, got it now. Thank you very much for the explanation and the examples!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 16:09
:ok_hand:
micans
@micans
Aug 03 2018 16:33
Always learn a lot of useful stuff when lurking here. Have a great weekend everyone!
Paolo Di Tommaso
@pditommaso
Aug 03 2018 16:33
me too !
:smile:
same there, enjoy the weekend
Tim Dudgeon
@tdudgeon
Aug 03 2018 17:09

I'm having a problem with using the -resume option. My workflow follows a typical split and execute in parallel pattern.
If the execution is terminated part way through and then re-executed with the -resume I would expect that the tasks that have already completed would not be re-executed. But this is not happening. Only the initial split step uses the cached results (and thus we know the inputs to the subsequent parallel tasks will be identical).
Notice that the hash values for the second execution are quite different from the first.
The workflow definition can be seen here: https://github.com/InformaticsMatters/dls-fragalysis-stack-openshift/blob/master/s2g-processor/nextflow/graph.nf
Here's it is in action:

[timbo@xps analysis]$ nextflow run graph.nf -with-docker busybox -with-trace
N E X T F L O W  ~  version 0.29.0
Launching `graph.nf` [compassionate_pike] - revision: d24a7330a0
[warm up] executor > local
[04/d99358] Submitted process > headShred
[1a/1a87d0] Submitted process > cgd (2)
[a5/f03280] Submitted process > cgd (5)
[c9/99f220] Submitted process > cgd (4)
[7b/5c7321] Submitted process > cgd (8)
[34/ce2441] Submitted process > cgd (6)
[fb/ab8f85] Submitted process > cgd (1)
[98/ea65a1] Submitted process > cgd (7)
[e8/974f63] Submitted process > cgd (3)
[74/816d3c] Submitted process > cgd (11)
[be/b6723c] Submitted process > cgd (9)
[8b/8b951c] Submitted process > cgd (12)
[5e/f89ebd] Submitted process > cgd (10)
[0f/6d270b] Submitted process > cgd (13)
^C
WARN: Killing pending tasks (8)
[timbo@xps analysis]$ nextflow run graph.nf -with-docker busybox -with-trace -resume
N E X T F L O W  ~  version 0.29.0
Launching `graph.nf` [pedantic_ritchie] - revision: d24a7330a0
[warm up] executor > local
[04/d99358] Cached process > headShred
[38/d8847d] Submitted process > cgd (8)
[f2/aaae8b] Submitted process > cgd (4)
[84/dde626] Submitted process > cgd (5)
[d6/f48ced] Submitted process > cgd (9)
[fa/606c85] Submitted process > cgd (11)
[d8/304b3b] Submitted process > cgd (10)
[98/312a78] Submitted process > cgd (6)
[9c/5b261c] Submitted process > cgd (7)
[a7/b71921] Submitted process > cgd (3)
[97/8f5621] Submitted process > cgd (2)
[36/c946ac] Submitted process > cgd (1)
[cb/9aa154] Submitted process > cgd (13)
[87/ec1f55] Submitted process > cgd (12)
[89/eece79] Submitted process > cgd (14)
[d7/9fb750] Submitted process > cgd (16)
[cd/86f0a5] Submitted process > cgd (15)
[timbo@xps analysis]$

When the initial execution was is Ctrl-C'd tasks cgd (1,7,6,2,4) had completed.

Shawn Rynearson
@srynobio
Aug 03 2018 17:10
I was wondering if anyone else has encountered a similar issue that I am on aws-batch. I have a simple samtools merge job running (but I'm running it on ~2500 samples) and once a job completes the docker run is not killed, which leaves all remaining data and fills up the EBS volume making the EC2 images unusable. I've checked the EC2 docker agent and it's set correctly. One issue that might be creating this is how the agent is ran, it "checks for Docker images that are not referenced by running or stopped container" so if I have a large number of nextflow job it could possibly see the reference and keep the docker images live. Does anyone have an idea of a post process set I could run to kill the running image of a job which has completed?
Shawn Rynearson
@srynobio
Aug 03 2018 17:18
Based on the nextflow docs it looks like the images should be removed by default is this also true when running on aws-batch?
Shawn Rynearson
@srynobio
Aug 03 2018 21:09

I think I've debugged the issue. It looks like the Amazon ECS container agent (which cleans up unused docker runs/images) is run the following way:

By default, the Amazon ECS container agent automatically cleans up stopped tasks and Docker images that are not being used by any tasks on your container instances.

but by default nextflow Task definition uses the same definition id for all jobs launched, which never allows the agent to clean up old docker containers.

Evan Floden
@evanfloden
Aug 03 2018 21:31
@tdudgeon Did you try replacing mode flatten in the output of the first process with a flatMap operator outside the process as is done in the documentation example here?
Another idea could be to collect and sort the elements in the channel to confirm the downstream process is receiving the exact same information on resume.
Evan Floden
@evanfloden
Aug 03 2018 21:38
You could also name the tasks (add a tag) then run a diff to compare the work directory of the same task across runs to spot any differences.
Brad Langhorst
@bwlang
Aug 03 2018 21:54
It seems that nextflow is not happy to be run within the same folder as another concurrent nextflow process… even with -w set to a unique location. will disabling cache help? or do i need mktmp for each execution?