These are chat archives for ManageIQ/manageiq/performance

5th
Jan 2018
Joe Rafaniello
@jrafanie
Jan 05 2018 15:39
@dmetzger57 do you have a graph of the miq server memory leak by version? I'm curious if they're all using ruby 2.3.x
Dennis Metzger
@dmetzger57
Jan 05 2018 15:41
Had data/charts, but they weren't apples to apples runs, so I started 5.7/5.8/5.9 runs yesterday against equivalent environments with fresh appliances
Joe Rafaniello
@jrafanie
Jan 05 2018 15:43
Have we found a faster way to recreate it?
I'd like to run it on my local appliance with systemtap or other utilities but it takes so long to even know it's happening
Dennis Metzger
@dmetzger57
Jan 05 2018 15:44
just let it run, @NickLaMuro has it narrowed down to a method (least it appears that way) and has been stripping logic there - I leave the real story to @NickLaMuro
Nick LaMuro
@NickLaMuro
Jan 05 2018 15:48
I mean, no, not really a faster way to recreate it at this point
I have narrowed it down to it being something in MiqServer.monitor_workers, and am slowly commenting out code in that
but that is slow going as well
One thing we thought it might have been was the DRb and Timeout monkey patching that we have done, but as I mentioned over the break, it doesn't look like that was the case: https://gitter.im/ManageIQ/manageiq/performance?at=5a451341edd2230811e914ea

Also mentioned, it doesn't look like this section of code is the issue either:

https://github.com/ManageIQ/manageiq/blob/ed4af9d/app/models/miq_server/worker_management/monitor.rb#L26-L35

(of note: I only started with commenting out that code because it was something I hadn't recreated a small script for)
Joe Rafaniello
@jrafanie
Jan 05 2018 19:07
hey @NickLaMuro, have you tried commenting sync_workers?
I wonder if anything works if you do that ;-)
Joe Rafaniello
@jrafanie
Jan 05 2018 19:18
I guess you could disable some of the monitor class names in bunches
Nick LaMuro
@NickLaMuro
Jan 05 2018 19:20
i did do sync_workers, but didn't post it here
it has the side affect of only booting up three workers when you do that, regardless of what is configured, and i have found that the number of workers being booted affects how fast the leak is
Joe Rafaniello
@jrafanie
Jan 05 2018 19:24
yeah
that makes sense
Joe Rafaniello
@jrafanie
Jan 05 2018 21:12
@NickLaMuro I'm looking at your graphs :point_up: December 28, 2017 10:52 AM, would you say the typical memory "leak" is around 120MB/day or 5 MB/hour, is that roughly correct?
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:17
@jrafanie yeah, that sounds right... but for 14 wrks
slower when you have less
also, random update, I am able to replicate the leak on a vagrant VM
Joe Rafaniello
@jrafanie
Jan 05 2018 21:18
when do you know you've confirmed it? I'm trying something to make it grow faster and I don't know when it's "confirmed" :laughing:
Can you DM me an ip address of a "confirmed leaker"?
I want to look at the /proc :neckbeard:
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:23
@jrafanie well, here is a graph of the vagrant VM:
20171207_12560.png
but yes, I can also get you some creds
Joe Rafaniello
@jrafanie
Jan 05 2018 21:23
ugh, 3 days
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:24
technically a month, because when I shut the laptop lid, but it retains it's current time prior to being shutdown in virtualbox
Joe Rafaniello
@jrafanie
Jan 05 2018 21:24
for "fun", I made the server be really sure the workers are sync'd
diff --git a/app/models/miq_server/worker_management/monitor.rb b/app/models/miq_server/worker_management/monitor.rb
index 3974a3d..e6af408 100644
--- a/app/models/miq_server/worker_management/monitor.rb
+++ b/app/models/miq_server/worker_management/monitor.rb
@@ -19,7 +19,7 @@ module MiqServer::WorkerManagement::Monitor
     resync_needed, sync_message = sync_needed?

     # Sync the workers after sync'ing the child worker settings
-    sync_workers
+    20.times { sync_workers }

     MiqWorker.status_update_all
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:24
heh
Joe Rafaniello
@jrafanie
Jan 05 2018 21:25
It's growing over a few hours but I can't tell if it's just reaching an equilibrium or actually leaking
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:25
I am looking into increasing the :monitor_poll settings and the :worker_monitor_frequency to a second a piece
on another box, just to see what happens
Joe Rafaniello
@jrafanie
Jan 05 2018 21:27
you forgot to say "for fun" ;-)
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:27
well, it is more for speeding up the turn around time
Joe Rafaniello
@jrafanie
Jan 05 2018 21:28
yeah, that's my thinking too
Nick LaMuro
@NickLaMuro
Jan 05 2018 21:28
if it was "fun", than I have been having a blast these past couple of months ;)