These are chat archives for nextflow-io/nextflow

4th
Sep 2018
rfenouil
@rfenouil
Sep 04 2018 09:14

Hello, sorry if the question is already asked somewhere. I don't understand how memory limit are specified in config files. From what I see in the doc and tried, when I do

    params.maxMem  = 32.GB

Does it call a member function of int that converts it to a 'memory' object ? to a string '32 GB'?

I noticed I can do operations like 2.GB*(2) so I guess there is a 'memory' object but I would appreciate confirmation to understand what happen here.
rfenouil
@rfenouil
Sep 04 2018 09:21
Final goal is to divide task.memory by task.cpus and get a string result in Megabytes (for a tool that requires to specify memory by threads instead of global).
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:31
mem can be specified either as a string 'n GB' or as a digit using the dot syntax ie. n.GB
rfenouil
@rfenouil
Sep 04 2018 09:32
Ok thank you, is the dot syntax a groovy thing or a nextflow addition ?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:32
it's a NF DSL extension
to get the mem as mega use task.memory.mega
rfenouil
@rfenouil
Sep 04 2018 09:33
ok great, then I can just divide it by nb of cpus I guess ?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:34
yep
rfenouil
@rfenouil
Sep 04 2018 09:35
what type task.memory.mega gives me, a mem object or a classic int ?
(not a string I guess)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:35
just a long
rfenouil
@rfenouil
Sep 04 2018 09:35
ok souds good
Ah great, thank you for the link too
Appreciate your help, hopefully you got rid of me ;)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 09:36
you are welcome :)
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:24

Hi! I have in my configuration file

process.container = 'lucacozzuto/vectorqc:latest'
docker.enabled = true

but I got this error when trying to run it

Command output:
  Unable to run job: job 26321708 soft requests for "docker_images" but no "docker" request.
  Exiting.
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:25
this is UGE problem, talk with gabriel
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:26
uge or HUGE?
:)
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:26
Univa Grid Engine .. :)
Luca Cozzuto
@lucacozzuto
Sep 04 2018 11:26
he just went to have lunch :) thanks anyway
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:27
he is smart :wink:
Shellfishgene
@Shellfishgene
Sep 04 2018 11:27
WARNING: Skipping user bind, non existent bind point (directory) in container: '/work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006' I this a nf, singularity, or flowcraft problem?
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:28
it looks a singularity warning
Shellfishgene
@Shellfishgene
Sep 04 2018 11:28
Yes, but I have no idea what causes it...
it's followed by
/bin/bash: line 0: cd: /work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006: No such file or directory /bin/bash: /work_beegfs/smomw240/flowcraft/work/bc/5a12c081895f07c9f8788ea31df006/.command.stub: No such file or directory
Paolo Di Tommaso
@pditommaso
Sep 04 2018 11:31
NF needs to mount that (host) path in the container, but for some reason it's not allowed
likely sysadmins do not allow
Vladimir Kiselev
@wikiselev
Sep 04 2018 14:38
Shawn Rynearson
@srynobio
Sep 04 2018 16:18
General question. I discover numerous rate limiting steps when using AWS-batch, and was wondering if there's a method, etc I could use which will limit the total number of processes to complete at a given time. e.g. I have 5000 individual samples to complete, but want to run 100 to completion at a time.
Mike Smoot
@mes5k
Sep 04 2018 16:24
@srynobio would the maxForks directive accomplish what you want. Only run 100 instances of a process at once?
Shawn Rynearson
@srynobio
Sep 04 2018 16:25
@mes5k would it run your nextflow script to completion, or is it per-process based?
Mike Smoot
@mes5k
Sep 04 2018 16:26
maxForks is per process.
micans
@micans
Sep 04 2018 16:26
queueSize is useful
executor {
    queueSize = 20
}
Shawn Rynearson
@srynobio
Sep 04 2018 16:28
@micans I've played around with queueSize. My guess is that it parallels maxForks in that it will keep your queue at a certain limit regardless of process.
Mike Smoot
@mes5k
Sep 04 2018 16:29
you can set different maxForks for different processes, so it gives you fairly fine control of what runs at once.
Shawn Rynearson
@srynobio
Sep 04 2018 16:29
Guess what I was hoping for is something similar to:
5000 jobs.

500 process to completion | 500 more process to completion |  ... finish all 5000.
micans
@micans
Sep 04 2018 16:29
queueSize is the total number of processes NF will run concurrently, right? You mention 'to completion'. Set both to the same value? Or what is it that you want?
Shawn Rynearson
@srynobio
Sep 04 2018 16:30
might be a big ask, but I wanted to check before I rewrite code.
micans
@micans
Sep 04 2018 16:30
is there a synchronisation step at the | ? so you want everything completed, then start a new batch
Mike Smoot
@mes5k
Sep 04 2018 16:31
Do you want to wait for all 500 to finish before you start the next 500?
micans
@micans
Sep 04 2018 16:31
there is a syncrhonisation step in groupTuple() ... I wonder if that would work
maybe a bit hacky
Shawn Rynearson
@srynobio
Sep 04 2018 16:32
Yes, It's one possible approach I'm thinking of to get around all the rate limiting steps AWS placing on the users.
Mike Smoot
@mes5k
Sep 04 2018 16:32
or the buffer or collate operators so that one process will run 500 samples.
Shawn Rynearson
@srynobio
Sep 04 2018 16:45
Thanks for the insight and allowing me to think out loud here. I'll review code and process.
Paolo Di Tommaso
@pditommaso
Sep 04 2018 18:19
@srynobio what rate limit are you referring ?
Shawn Rynearson
@srynobio
Sep 04 2018 21:15

@pditommaso AWS.

F.Y.I for anyone else planning on running in the cloud, AWS has a few different rate limits.

  1. EC2 limits: for each EC2 type you will initially be limited to how many EC2 instance you can spin up. Between 5-20.
  2. EC2 spot instances. Limit placed on how many of EC2's can be of type spot.
  3. S3 put/pull/get. Which can be a burden given that a "prefix" is a S3 bucket name.
  4. Limit on the total number of buckets you are allowed to create. Default limit: 100

Many (1, 2, 4) of the limit can be increased by request through AWS Support.

The issue I was running into earlier is/was due to the S3 put/get/pull limit. At large scale I was easily hitting my rate limit, which was causing a non-zero exit status "slowdown" of the aws s3 cli.

Just something to think about when considering running large scale WES/WGS data on AWS.

Paolo Di Tommaso
@pditommaso
Sep 04 2018 22:06
yes, of course, you need to request an upgrade your ec2 limits
regarding s3 my understanding is that a prefix is a path, not the bucket name
even if it were the bucket name, 5,500 GET requests per second is very high, you much more likely would be caped by the batch api limit, that in my experience is ~ 20 requests per second