These are chat archives for nextflow-io/nextflow

5th
Oct 2016
Mike Smoot
@mes5k
Oct 05 2016 16:40
@pditommaso Hi Paolo, I've got a question about tuning pipelines. Here is a screenshot of the CPU usage of a machine that's running a pipeline: http://uploadpie.com/3lQjb. This is one pipeline running for the entire X axis of the plot and the pipeline is running correctly. The first half of the plot are blast jobs running with task.cpus = 8 whereas the second half are a different command using the default task.cpus (which I understand is 1). I'm wondering if there is a way to tell nextflow that it can spawn more tasks at a time, but just for specific tasks? I realize that I could increase queueSize for the whole pipeline, but I worry that would trigger too many concurrent blast tasks. Maybe something like task.cpus = 0.25?
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:35
@mes5k are u setting queueSize ?
Mike Smoot
@mes5k
Oct 05 2016 17:35
I'm not
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:36
I'm confused, the second half of the chart is running a different task or always the same?
Mike Smoot
@mes5k
Oct 05 2016 17:39
many, many tasks of the same type
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:39
:)
I guess they are I/O bounded otherwise how they spending their time ..
however currently cpus fractions are not supported but it could be a nice improvement
Mike Smoot
@mes5k
Oct 05 2016 17:42
Yeah, I didn't think cpu fractions were supported. It does seem like it might be an elegant way to say "this task is small and fast"
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:43
I see the problem is that batch scheduler does not support not integer values ..
are u using the local executor right?
Mike Smoot
@mes5k
Oct 05 2016 17:46
Yeah, I can see that being a problem. And yes, it's the local executor, although I'd expect that the problem would exist with other executors too.
For the local executor is queueSize set by default to the number of CPUs on the machine or does it use the default 100?
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:47
the number of CPUs
but if you increase it than u will have too many blast tasks launched in parallel
Mike Smoot
@mes5k
Oct 05 2016 17:49
Right, which I definitely want to avoid!
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:49
unless u specify maxForks for the blast task
Mike Smoot
@mes5k
Oct 05 2016 17:49
Ah, that's interesting!
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:50
are u sure u don't have a problem with the task not consuming it's cpu time?
Mike Smoot
@mes5k
Oct 05 2016 17:50
I'm not sure understand.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:51
how many cores has this computer?
Mike Smoot
@mes5k
Oct 05 2016 17:51
64
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:52
thus in the second half of the chart you should have 64 parallel processes running, right?
Mike Smoot
@mes5k
Oct 05 2016 17:53
As best I can tell. I know all the tasks are running as they're producing results. Whether there are actually 64 running at once is a bit tricky to tell because they often run in less than a second
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:54
well, so trying to increase the number of parallel tasks won't solve the problem
could not you merge the short task in the blast one?
Mike Smoot
@mes5k
Oct 05 2016 17:57
Yeah, I've been thinking about batching the shorter tasks so that maybe 200 run as one task.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:57
this would be a great NF feature
Mike Smoot
@mes5k
Oct 05 2016 17:57
That just makes the associated code much more complicated.
To automatically batch jobs?
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:58
yes, I though many times but it's a bit tricky
Mike Smoot
@mes5k
Oct 05 2016 17:59
Create a special kind of process that runs the same script multiple times within the same unix process?
I agree, that would be really cool.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 17:59
yep
Mike Smoot
@mes5k
Oct 05 2016 18:01
What are the tricky bits that you've found?
Paolo Di Tommaso
@pditommaso
Oct 05 2016 18:03
the idea is to batch task execution by using a wrapper of wrappers, a kind of composite executor
the problem is that the code behind was not designed for that so it's not easy to implement
I dont remember a concrete problem now
Mike Smoot
@mes5k
Oct 05 2016 18:04
When you say wrapper, do you mean the .comman.run.1 wrapper?
Paolo Di Tommaso
@pditommaso
Oct 05 2016 18:05
not exactly, that is generated when using the trace file
it should be another wrapper invoking the .comman.run for each task in the batch
(which eventually could call .comman.run.1 ... )
Mike Smoot
@mes5k
Oct 05 2016 18:09
Yeah, I think that's the direction I thinking too. I was imagining tweaking .command.run.1 and maybe .command.sh to basically embed a for loop. I'd guess getting all of the files and symlinks into place might be a challenge.
In any case, that's for the future. For my immediate problem it probably sounds like I should be looking at batching things manually.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 18:10
yep
Mike Smoot
@mes5k
Oct 05 2016 18:10
Sounds good.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 18:10
but in any case I will try to give a try later
I will keep u updated
Mike Smoot
@mes5k
Oct 05 2016 18:11
That sounds great. In other news, it looks like I may finally be getting the aws credentials I need to begin experimenting with nextflow cloud. Very excited to see where this goes.
Paolo Di Tommaso
@pditommaso
Oct 05 2016 18:12
good, I'm very interested on your feedback
Mike Smoot
@mes5k
Oct 05 2016 18:14
will definitely keep you posted