These are chat archives for ManageIQ/manageiq/performance

6th
Jan 2018
Joe Rafaniello
@jrafanie
Jan 06 2018 01:18
@NickLaMuro my test run with extra extra checks for synced workers grew the server 90 MB in 4 hours
 -    sync_workers
+    40.times { sync_workers }
top - 17:03:08 up 1 min,  0 users,  load average: 1.40, 0.40, 0.14
 1672 root      20   0  716240 325428   8032 S  48.9  5.5   0:29.34 ruby
top - 17:04:08 up 2 min,  0 users,  load average: 0.71, 0.38, 0.15
 1672 root      20   0  716240 325488   8032 R  31.3  5.5   0:48.14 ruby

top - 20:12:17 up  3:10,  1 user,  load average: 0.36, 0.38, 0.36
 1672 root      20   0  740248 411132   8100 S  35.4  6.9  61:39.43 ruby
top - 20:13:17 up  3:11,  1 user,  load average: 0.39, 0.37, 0.36
 1672 root      20   0  740728 411660   8100 R  35.7  6.9  62:00.90 ruby
or maybe that's 5 hours. Time is hard
I dumped the heap at the beginning and will try to take a few more dumps as it grows. hehe, :poop:
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:19
@jrafanie is that the ruby heap, or the "C heap"
Joe Rafaniello
@jrafanie
Jan 06 2018 01:20
ruby
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:20
but aren't we pretty sure it is a c heap issue that is outside of what ObjectSpace.dump knows about?
Joe Rafaniello
@jrafanie
Jan 06 2018 01:21
rbtrace in gemfile
require 'rbtrace' in an initializer
then, periodically doing rbtrace -p 1672 -e 'Thread.new{GC.start;require "objspace";io=File.open("/tmp/ruby-heap2.dump", "w"); ObjectSpace.dump_all(output: io); io.close}'
yeah, I'm hoping it's in ruby and it's been a while since I looked at a nice heap dump
have you already done that?
It feels like it's in sync_workers
oh well, it's late even for off by one timezone peeps, have a good weekend ;-)
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:23

have you already done that?

yeah, though I just added it on a periodic bases in the config/initializer that monkey patched MiqServer.monitor_poll that we did to add extra metrics to the logs

Joe Rafaniello
@jrafanie
Jan 06 2018 01:24
so, you've looked at a heap dump? I"m just hoping there's something in there that changes over time that might give us an indication. Seriously, have a good weekend, bye :wave:
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:24

It feels like it's in sync_workers

@jrafanie do you have a base line of what it looks like without the change?

(not trying to keep you here, just asking questions that I have. Respond monday)
Joe Rafaniello
@jrafanie
Jan 06 2018 01:25
yeah, that's easy to do, it never grew this quickly for me so I'm hopeful
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:25
mmk, cool
I did have some graphs where I commented out sync_workers (the inverse of what you are doing), and the leak seemed to still be present
but I only let it run for a couple of days, so I would love to be wrong
also... one of the MiqServer processes was restarted... then segfaulted for reasons unknown... so it could have been a bad test
Joe Rafaniello
@jrafanie
Jan 06 2018 01:28
interesting
well, I'll run a baseline too
Nick LaMuro
@NickLaMuro
Jan 06 2018 01:30
One random theory that I had is that it could be something to do with calling a method from another module (well, module that is namespaced under a class in this case), and it is some weirdness in C there that I haven't really tested for in isolation. It would explain the lack of seeing anything definitive in the ruby heap, but I have no proof it is even a thing.