These are chat archives for nextflow-io/nextflow

12th
Feb 2015
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:14
@pditommaso ElastiCluster is still pretty rough and im not sure it's being maintained
StarCluster is also lacking in maintenance, and its AMIs are pretty inflexible w/o really getting one's hands dirty.
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:15
I see
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:15
They really need to upgrade their AMI's to Ubuntu 14.04, and keep their CentOS version upgraded, in order to support Docker.
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:16
That's why I give up with SC as well
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:19
I like the idea of EC, I just haven't had the time to figure out how to use it
Quick question while you're here: to grab the latest version of a pipeline from github/bitbucket, I'd do "-r latest" ?
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:21
um, no
-latest
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:21
ah
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:21
-r switch to a different revision i.e. branch/tag
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:21
so... nextflow run namespace/repository -hub myprovider -latest
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:22
yep
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:22
Gotcha.
sweet!
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:22
Enjoy!
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:22
Can I specify those options in nextflow.config?
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:23
unfortunately no, that was an idea but it's a mess to implement
Andrew Stewart
@andrewcstewart
Feb 12 2015 17:54
Is there any way to dynamically set config options like "executor.$sge.queueSize = 100" based on, say, the number of nodes in the cluster?
Paolo Di Tommaso
@pditommaso
Feb 12 2015 17:55
no this no
but, nodes in a cluster are almost the same .. no?
ah .. ok SC
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:03
Im just wondering if I need to add fancy job regulation at the SGE level or if I can do it through nextflow
SGE isn't by default aware of memory consumption, so several memory-hog processes on the same node can hose it
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:05
well I think depends on the config of your installation
AFAIK in our cluster there's a default mem limit per job
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:08
I need to set the limit by node
for this scenario
Its a common scenario whenever bioinformatics meets SGE
Ive received lots of angry emails over the years from cluster admins used to dealing with physics computing :D
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:09
:)
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:09
"WTF DO YOU NEED ALL THAT MEMORY FOR! FIX YOUR MEMORY LEAK IDIOT!"
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:09
ahah
but why not set a mem limit per job?
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:10
Because I know how much memory the job will use
the issue is with running multiple such jobs on the same node
SGE is smart enough to limit 1 job per cpu
but it doesnt have a native concept of memory utilization
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:11
no?
You end up having to use 'resource quotas'
basically have SGE track available memory as a metadata to each node
and then each job has to specifically request how much memory it will use (and a memory limit to enforce that assumption)
then SGE knows not to schedule more high memory jobs than a particular node can handle
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:13
exactly
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:13
It's a very nice system when in place, but requires getting into the guts of SGE administration
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:13
:)
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:14
But a simple alternative could be to just add a queueSize limit in nextflow to any high-memory process
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:15
um, I see
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:15
not perfect, but it could work
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:16
well, not exactly the same but the maxForks directive does almost the same
no?
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:18
Ah, that might be doing what Im thinking queueSize would do
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:18
yes, thinking better is exactly what you are looking for
:)
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:20
so whats does queueSize do then?
or how is it different from maxForsk
Paolo Di Tommaso
@pditommaso
Feb 12 2015 18:20
limit the number of jobs that can be submitted to the nextflow internal queue
maxForks is the same at process level (because it does not distinguish between submitted and run)
Andrew Stewart
@andrewcstewart
Feb 12 2015 18:23
ah