These are chat archives for nextflow-io/nextflow

4th
Sep 2018
rfenouil
@rfenouil
Sep 04 2018 09:14 UTC

Hello, sorry if the question is already asked somewhere. I don't understand how memory limit are specified in config files. From what I see in the doc and tried, when I do

    params.maxMem  = 32.GB

Does it call a member function of int that converts it to a 'memory' object ? to a string '32 GB'?

I noticed I can do operations like 2.GB*(2) so I guess there is a 'memory' object but I would appreciate confirmation to understand what happen here.
rfenouil
@rfenouil
Sep 04 2018 09:21 UTC
Final goal is to divide task.memory by task.cpus and get a string result in Megabytes (for a tool that requires to specify memory by threads instead of global).
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:31 UTC
mem can be specified either as a string 'n GB' or as a digit using the dot syntax ie. n.GB
rfenouil
@rfenouil
Sep 04 2018 09:32 UTC
Ok thank you, is the dot syntax a groovy thing or a nextflow addition ?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:32 UTC
it's a NF DSL extension
to get the mem as mega use task.memory.mega
rfenouil
@rfenouil
Sep 04 2018 09:33 UTC
ok great, then I can just divide it by nb of cpus I guess ?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:34 UTC
yep
rfenouil
@rfenouil
Sep 04 2018 09:35 UTC
what type task.memory.mega gives me, a mem object or a classic int ?
(not a string I guess)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:35 UTC
just a long
rfenouil
@rfenouil
Sep 04 2018 09:35 UTC
ok souds good
Ah great, thank you for the link too
Appreciate your help, hopefully you got rid of me ;)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:36 UTC
you are welcome :)
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:24 UTC

Hi! I have in my configuration file

process.container = 'lucacozzuto/vectorqc:latest'
docker.enabled = true

but I got this error when trying to run it

Command output:
  Unable to run job: job 26321708 soft requests for "docker_images" but no "docker" request.
  Exiting.
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:25 UTC
this is UGE problem, talk with gabriel
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:26 UTC
uge or HUGE?
:)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:26 UTC
Univa Grid Engine .. :)
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:26 UTC
he just went to have lunch :) thanks anyway
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:27 UTC
he is smart :wink:
Shellfishgene
@Shellfishgene
Sep 04 2018 11:27 UTC
WARNING: Skipping user bind, non existent bind point (directory) in container: '/work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006' I this a nf, singularity, or flowcraft problem?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:28 UTC
it looks a singularity warning
Shellfishgene
@Shellfishgene
Sep 04 2018 11:28 UTC
Yes, but I have no idea what causes it...
it's followed by
/bin/bash: line 0: cd: /work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006: No such file or directory /bin/bash: /work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006/.command.stub: No such file or directory
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:31 UTC
NF needs to mount that (host) path in the container, but for some reason it's not allowed
likely sysadmins do not allow
Vladimir Kiselev
@wikiselev
Sep 04 2018 14:38 UTC
Shawn Rynearson
@srynobio
Sep 04 2018 16:18 UTC
General question. I discover numerous rate limiting steps when using AWS-batch, and was wondering if there's a method, etc I could use which will limit the total number of processes to complete at a given time. e.g. I have 5000 individual samples to complete, but want to run 100 to completion at a time.
Mike Smoot
@mes5k
Sep 04 2018 16:24 UTC
@srynobio would the maxForks directive accomplish what you want. Only run 100 instances of a process at once?
Shawn Rynearson
@srynobio
Sep 04 2018 16:25 UTC
@mes5k would it run your nextflow script to completion, or is it per-process based?
Mike Smoot
@mes5k
Sep 04 2018 16:26 UTC
maxForks is per process.
Stijn van Dongen
@micans
Sep 04 2018 16:26 UTC
queueSize is useful
executor {
    queueSize = 20
}
Shawn Rynearson
@srynobio
Sep 04 2018 16:28 UTC
@micans I've played around with queueSize. My guess is that it parallels maxForks in that it will keep your queue at a certain limit regardless of process.
Mike Smoot
@mes5k
Sep 04 2018 16:29 UTC
you can set different maxForks for different processes, so it gives you fairly fine control of what runs at once.
Shawn Rynearson
@srynobio
Sep 04 2018 16:29 UTC
Guess what I was hoping for is something similar to:
5000 jobs.

500 process to completion | 500 more process to completion |  ... finish all 5000.
Stijn van Dongen
@micans
Sep 04 2018 16:29 UTC
queueSize is the total number of processes NF will run concurrently, right? You mention 'to completion'. Set both to the same value? Or what is it that you want?
Shawn Rynearson
@srynobio
Sep 04 2018 16:30 UTC
might be a big ask, but I wanted to check before I rewrite code.
Stijn van Dongen
@micans
Sep 04 2018 16:30 UTC
is there a synchronisation step at the | ? so you want everything completed, then start a new batch
Mike Smoot
@mes5k
Sep 04 2018 16:31 UTC
Do you want to wait for all 500 to finish before you start the next 500?
Stijn van Dongen
@micans
Sep 04 2018 16:31 UTC
there is a syncrhonisation step in groupTuple() ... I wonder if that would work
maybe a bit hacky
Shawn Rynearson
@srynobio
Sep 04 2018 16:32 UTC
Yes, It's one possible approach I'm thinking of to get around all the rate limiting steps AWS placing on the users.
Mike Smoot
@mes5k
Sep 04 2018 16:32 UTC
or the buffer or collate operators so that one process will run 500 samples.
Shawn Rynearson
@srynobio
Sep 04 2018 16:45 UTC
Thanks for the insight and allowing me to think out loud here. I'll review code and process.
Paolo Di Tommaso
@pditommaso
Sep 04 2018 18:19 UTC
@srynobio what rate limit are you referring ?
Shawn Rynearson
@srynobio
Sep 04 2018 21:15 UTC

@pditommaso AWS.

F.Y.I for anyone else planning on running in the cloud, AWS has a few different rate limits.

  1. EC2 limits: for each EC2 type you will initially be limited to how many EC2 instance you can spin up. Between 5-20.
  2. EC2 spot instances. Limit placed on how many of EC2's can be of type spot.
  3. S3 put/pull/get. Which can be a burden given that a "prefix" is a S3 bucket name.
  4. Limit on the total number of buckets you are allowed to create. Default limit: 100

Many (1, 2, 4) of the limit can be increased by request through AWS Support.

The issue I was running into earlier is/was due to the S3 put/get/pull limit. At large scale I was easily hitting my rate limit, which was causing a non-zero exit status "slowdown" of the aws s3 cli.

Just something to think about when considering running large scale WES/WGS data on AWS.

Paolo Di Tommaso
@pditommaso
Sep 04 2018 22:06 UTC
yes, of course, you need to request an upgrade your ec2 limits
regarding s3 my understanding is that a prefix is a path, not the bucket name
even if it were the bucket name, 5,500 GET requests per second is very high, you much more likely would be caped by the batch api limit, that in my experience is ~ 20 requests per second