These are chat archives for nextflow-io/nextflow

13th
Oct 2016
Rickard Hammarén
@Hammarn
Oct 13 2016 07:02
Hi! I have a question about the Nextflow daemon. It seems that when I run our pipeline: https://github.com/SciLifeLab/NGI-RNAseq I get memory usages in the 30-40GB range. Is this expected? And is there something I can do to reduce this amount. I've been running with the -with-timeline option, does that contribute in any significant way?
Johan Viklund
@viklund
Oct 13 2016 07:04
how big is the stringtie_log?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:18
the java heap - by default - can grow up to the 60%-70% of the available physical memory
so I guess you have 64GB of ram
Rickard Hammarén
@Hammarn
Oct 13 2016 07:20
I see, I think its 256 GB ram actually. But can I limit this usage, will it affect the pipeline execution?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:20
so it can grow much more ! :)
Rickard Hammarén
@Hammarn
Oct 13 2016 07:20
That's not reassuring
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:20
but you can set an upper limit for that, 4gb should me more than enough
define the following var
NXF_OPTS='-Xms1g -Xmx4g'
Rickard Hammarén
@Hammarn
Oct 13 2016 07:22
Great! Thanks!
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:22
it constraint the heap between 1 and 4 gb
I think you can shrink up to 2 but this could impact on the performance, you should profile it a bit
Rickard Hammarén
@Hammarn
Oct 13 2016 07:29
Alright! On a similar note, I have not seen it myself, but Nextflow seems to use lots of threads in burst when creating jobs which can affect the perfromance on the node. I guess there is a similar parameter to limit this?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:30
java threads? or posix processes?
Rickard Hammarén
@Hammarn
Oct 13 2016 07:31
the later I would assume
From what I can se myself the only thing that happens is that the main Java process spikes to 100% CPU usage upon job submission, but that's only one process so that does not matter
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:36
if you are not using the local executor, nextflow does not launch other posix process
however java threads and mapped on OS threads, thus I think you are referring that
the thread pool size management is a kind of black science
you can control with the system properties nxf.pool.type and nxf.pool.maxThreads
refer to this code
Rickard Hammarén
@Hammarn
Oct 13 2016 07:42
Thanks! I don't think it's an issue but it's good to know were to start looking
Paolo Di Tommaso
@pditommaso
Oct 13 2016 07:42
I would be interested as well in this optimisation ;)
you may want to try nxf.pool.type=bounded or nxf.pool.type=unbounded
Lukas Jelonek
@lukasjelonek
Oct 13 2016 11:57
@pditommaso optional works :) Thanks, now I don't need any workaround code anymore :)
Paolo Di Tommaso
@pditommaso
Oct 13 2016 11:57
great!
Félix C. Morency
@fmorency
Oct 13 2016 14:01
morning.
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:01
almost goodnight here ;)
Félix C. Morency
@fmorency
Oct 13 2016 14:08
@pditommaso do you have knowledge on how mounts are propagated in a DinD setup?
im wondering if I will have the same issue with any kind of mounts
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:10
it depends what do you mean for propagated
I don't think they are
Félix C. Morency
@fmorency
Oct 13 2016 14:10
docker inspect show that the mount propagation is set to rprivate
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:11
in principle there should not be any difference
Félix C. Morency
@fmorency
Oct 13 2016 14:11
docker documentation shows that you can make a special kind of mount using --make-shared and use -v path:path:shared to actually propagate the mount
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:12
it looks like a new feature
do you have the link?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:17
it doesn't explain so much ..
Félix C. Morency
@fmorency
Oct 13 2016 14:18
iiuc, using -v the way nextflow is using it atm will not work correctly in a DinD setup because -v doesn't pass the mount, but the folder that you were mounting to
except if you use a shared mount and :shared(not tested here)
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:20
if it's only that the problem I could add an option to append the shared in the mount
Félix C. Morency
@fmorency
Oct 13 2016 14:22
thing is, it won't work if you didn't use the special bind/--make-shared flags to make your mount (see https://github.com/docker/docker/issues/4213#issuecomment-195241948)
i tested it with my current setup and it complains it's not a share mount.
however, having an option to use named volume instead of inline volume would be great
or support both alternatives. that would make nextflow more DinD-friendly
i have the setup to test the named volume thing if you want me to try
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:27
if you can provide my the exact command lines to replicated the configuration in a no-brainer way, I can try to integrate it
you may want to open a feature request for that
Félix C. Morency
@fmorency
Oct 13 2016 14:28
sure, will do
Paolo Di Tommaso
@pditommaso
Oct 13 2016 14:29
:+1:
Félix C. Morency
@fmorency
Oct 13 2016 15:01
when did you start the nextflow development?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:01
4 years ago more or less
Félix C. Morency
@fmorency
Oct 13 2016 15:02
wish i had discovered it before
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:02
:)
how did you find it ?
Félix C. Morency
@fmorency
Oct 13 2016 15:03
i was getting tired of our current (borked) architecture and was looking for a DSL to describe a graph what would autogenerate the DAG from the i/o of each node
i had plan for docker integration and stuff like that for our software and... it's all in nextflow
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:05
what's your organisation ?
Félix C. Morency
@fmorency
Oct 13 2016 15:06
we're doing state-of-the-art dMRI processing and visualization
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:06
ah cool !
we do a lot of bioinformatics, would be interesting to have some neuroimaging use cases
Félix C. Morency
@fmorency
Oct 13 2016 15:13
im working on more tests to see how nextflow handles our type of processing/scheduling and will plan porting our pipeline accordingly
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:13
on what is based you current system ?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:14
ah luigi
quite popular, but I think does not fit well scientific workflows
Félix C. Morency
@fmorency
Oct 13 2016 15:15
that. exactly.
we keep hacking around its limitations.
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:16
I guess so
Félix C. Morency
@fmorency
Oct 13 2016 15:16
it made sense to use it back in the days but i don't want to keep building on it
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:16
to test nextflow you can use just a single workstation
without getting crazy with the slurm + docker thing
Félix C. Morency
@fmorency
Oct 13 2016 15:17
yeah will do that for the pipeline. slurm + docker was more to test the nextflow capabilities and how easy it is to switch scheduler
Félix C. Morency
@fmorency
Oct 13 2016 15:26
would you recommend to test it without docker (local scheduler) first? are there corner cases to be aware of?
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:26
well, docker is useful to share your workflow in a consolidated manner
but I guess you have already installed all the SW you, so I will start without it
Félix C. Morency
@fmorency
Oct 13 2016 15:28
not sure i understand that last sentense
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:28
yes there are some corners cases, you may want to have a look here
Félix C. Morency
@fmorency
Oct 13 2016 15:28
cool
Paolo Di Tommaso
@pditommaso
Oct 13 2016 15:29
I mean usually bioinformatics pipelines contains a large number of different tools, so it's easier to have them in a container then install them on by one
but I think you should have already installed the tools you want to test with NF
thus you won't need docker at least to make some test
Félix C. Morency
@fmorency
Oct 13 2016 15:30
right