These are chat archives for ManageIQ/manageiq/performance

14th
Oct 2015
Jason Frey
@Fryguy
Oct 14 2015 14:15
@dmetzger57 How much can be put into a vcsim ? That is, does it cover all objects?
I'd love to be able to have an environment builder for the vmware specs
the refresh in particular is enormous, and we could save a lot of time with a smaller simpler spec
Dennis Metzger
@dmetzger57
Oct 14 2015 14:15
it's a full VC, so everything is there
Jason Frey
@Fryguy
Oct 14 2015 14:15
! :D
Dennis Metzger
@dmetzger57
Oct 14 2015 15:02
First thing i looked at was comparing appliance memory used between 5.3/5.4/Master on small VMware and RHEV environments. This chart https://drive.google.com/a/redhat.com/file/d/0B_3dlq4um_MucnYxX1ZzNUFCSFk/view?usp=sharing (sorry data still in Excel, will get it to Google Sheets) shows the real problem lies in the base system. The chart is from boot, idle for 120 seconds then initiat the initial inventory refresh (provider added). As you can see, we have a memory growth issue at startup.
Jason Frey
@Fryguy
Oct 14 2015 15:23
@tenderlove @matthewd Here is the current set of charts that @jrafanie has: https://docs.google.com/spreadsheets/d/1LH9JpLJPoWSlpWhQmEK-jT7HYqhy8glo6gsIvZNMat4/edit#gid=2089394390
basically, using Aman Gupta's RUBY_GC_* numbers we are close to Ruby 2.0.0 levels
The numbers @dmetzger57 found are more concerning to me
(his numbers don't unclude the RUBY_GC_* tweaks though, so I'd want to see that
in his chart, you can see that the refresh itself is using the same memory...it's the baseline that's moved up significantly
Jason Frey
@Fryguy
Oct 14 2015 15:28
also, I found some places in the vmware refresh where we can release data earlier. Hopefully this could keep the high water mark from getting too high, but we're still running into those objects being held somewhere
Matthew Draper
@matthewd
Oct 14 2015 15:29
What sorts of objects are they?
Jason Frey
@Fryguy
Oct 14 2015 15:29
VimHash objects
basically the data from the Vim inventory post unmarshal_response
it's being held by a proc :(
trying to undersatnd why the proc is holding it...but we're also trying to see if there is a difference with a "real" broker vs what we were doing
Matthew Draper
@matthewd
Oct 14 2015 15:31
Which proc is it?
Jason Frey
@Fryguy
Oct 14 2015 15:31
because our "faking out" of the broker might be involved
you're gonna be surprised ;)
also, chasing this issue might not be a "for right now" problem, if the baselines are all out of whack
matthewd @matthewd is surprised
Matthew Draper
@matthewd
Oct 14 2015 15:37
@dmetzger57 just to confirm, this is total memory used in the appliance?
Dennis Metzger
@dmetzger57
Oct 14 2015 15:38
@matthewd correct, total memory used by all processes.
need to consider / identify what RHEL 6 -> Centos 7 (and RHEL 7) costs, cause that is also in play
Matthew Draper
@matthewd
Oct 14 2015 15:41
Okay, so are you planning on a stacked graph of our individual ruby processes, so we can see distribution (and also compare to this, to check for change in OS overhead)?
Dennis Metzger
@dmetzger57
Oct 14 2015 15:44
think i'm going to get the memory usage on the two platforms with the starting of EVM disabled, this will be more "apples to apples" as it will be grabbing / comparing the same global data point.
we can then add / look at the individual evm processes
granted. the sum of evm processes "should" yield the same number :smile:
Jason Frey
@Fryguy
Oct 14 2015 15:48
I just remembered this old tuning we had on the appliance: https://github.com/ManageIQ/manageiq-appliance/blob/master/LINK/etc/default/evm#L10-L12
I wonder if that's contributing to the problem since it was done for RHEL6
(technically it was done for glibc)
Joe Rafaniello
@jrafanie
Oct 14 2015 16:02

I just remembered this old tuning we had on the appliance: https://github.com/ManageIQ/manageiq-appliance/blob/master/LINK/etc/default/evm#L10-L12

https://bugzilla.redhat.com/show_bug.cgi?id=1011630#c14 from git sha 247be16cd5c2e8c68c0dccac2364f0ddf7c81461 in manageiq repo with old history

Dennis Metzger
@dmetzger57
Oct 14 2015 18:09
@r
@jrafanie here's the plot with Master and Ruby 2.0.0 https://drive.google.com/open?id=0B_3dlq4um_MuSzNQUEtJOGY1Mmc
Joe Rafaniello
@jrafanie
Oct 14 2015 18:11
i need permission
Dennis Metzger
@dmetzger57
Oct 14 2015 18:17
weird, I didn't mark the file as private
Matthew Draper
@matthewd
Oct 14 2015 18:28
@tenderlove I think @jrafanie's already running off 4-2-stable for my other fix, if you want to just commit that
Aaron Patterson
@tenderlove
Oct 14 2015 18:28
matthewd: I'm having jrafanie test it
Jason Frey
@Fryguy
Oct 14 2015 18:52
{
 "/Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-52a9beb65940/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb:168"=>
  251362,
 "/Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-52a9beb65940/activerecord/lib/active_record/attribute_set/builder.rb:100"=>
  162467,
 "/Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-52a9beb65940/activerecord/lib/active_record/attribute.rb:5"=>
  122170,
 "/Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-52a9beb65940/activerecord/lib/active_record/attribute.rb:9"=>
  96727,
 "/Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-52a9beb65940/activerecord/lib/active_record/attribute_methods.rb:359"=>
  56030
}
Jason Frey
@Fryguy
Oct 14 2015 19:26
@tenderlove Comparing dumps with and without the freeze change:
Before: 333180 STRING objects / After 315688 STRING objects.
Granted that's with ObjectSpace.dump_all, so that's live strings, not allocation counts
Joe Rafaniello
@jrafanie
Oct 14 2015 19:27
@tenderlove I'm seeing too much variations of the String allocations based solely on the class types against the "small" ems to confirm/deny any improvements in your freeze patch: https://gist.github.com/jrafanie/636b5e14790b338f1522
Jason Frey
@Fryguy
Oct 14 2015 19:53
Ok, this is going to sound stupid, but bear with me :)
Is there a way to get the reference of a key in a Hash or an element in a Set directly
without traversing...I want to leverage the Hash lookup
in my case, I want to use a Set as a String pool that would contain frozen strings.
As I get new strings from elsewhere, I want to swap it out for the frozen one in the StringPool
Matthew Draper
@matthewd
Oct 14 2015 19:55
I just used a k=v hash
Jason Frey
@Fryguy
Oct 14 2015 19:55
yeah, that's what I was coming down to
seems silly to have to do that, but oh well :)
I want like a Hash#fetch_key method or something or Set#[]
Matthew Draper
@matthewd
Oct 14 2015 19:56
I guess either of those would be a bit implementation-ish… if you already have something that's "the same", why would you want the 'other' one?
Jason Frey
@Fryguy
Oct 14 2015 19:56
yeah, it really is
Matthew Draper
@matthewd
Oct 14 2015 19:57
I mean.. I obviously understand why you do… but it's leaking an abstraction
Jason Frey
@Fryguy
Oct 14 2015 19:57
agreed
wouldn't other Ruby implementations have the same underlying problem though?
Matthew Draper
@matthewd
Oct 14 2015 19:57
Yeah, it's more a language detail than a runtime one, I suppose
Jason Frey
@Fryguy
Oct 14 2015 19:58
yeah
Matthew Draper
@matthewd
Oct 14 2015 19:58
Now I'm curious about whether you're putting this where I had it :)
Jason Frey
@Fryguy
Oct 14 2015 19:58
probably not...mine's in memory_analyzer
but it did get me thinking about using a StringPool in other places
like the VimInventory
particular in the in-memory cache...we're keeping all the data anyway...and there are likely TONs of duplicated String objects
I wish .freeze would implement the StringPool idea for me :)
Matthew Draper
@matthewd
Oct 14 2015 19:59
That's called .to_sym ;)
Jason Frey
@Fryguy
Oct 14 2015 20:00
yeah :)
but I don't want to have to allocate a new string object everytime I want to do something with it
The story around sym vs string is really started to blur nowadays
especially now that syms are GCed
* sym vs frozen string
Aaron Patterson
@tenderlove
Oct 14 2015 20:11
jrafanie: is that the same script you were using to generate the heap dump you gave me on Monday?
Joe Rafaniello
@jrafanie
Oct 14 2015 20:12
I've create so many dumps, I don't remember
Aaron Patterson
@tenderlove
Oct 14 2015 20:13
hah
Joe Rafaniello
@jrafanie
Oct 14 2015 20:13
typically I will dump things in 2 places: 1) after refresh before ems goes out of scope or 2) after refresh, nil out the ems, gc
Jason Frey
@Fryguy
Oct 14 2015 20:14
LOL:
class StringPool < Set
  def add(o)
    o = o.dup.freeze unless o.frozen?
    @hash[o] = o
    self
  end

  alias :<< :add

  def [](o)
    @hash[o]
  end
end
I'm such a cheater - it works great though :)
Aaron Patterson
@tenderlove
Oct 14 2015 20:41
jrafanie: I ran your script
I went from
9764680 T_STRING:String
9353147 T_STRING:String
Dennis Metzger
@dmetzger57
Oct 14 2015 20:47
Noticed yesterday that an idle (booted, never had a provider added) Master appliance constantly drops available memory - until I stop the app 'rake evm:stop'. It a slow drain, but consistent. This sheet https://docs.google.com/a/redhat.com/spreadsheets/d/1-qiyKxV6m1KuxMeWnx2S-U9vGAIPjTyaHw6HzGGp8WM/edit?usp=sharing shows a simple test, first chart is available memory as seen every second for 10 minutes of idle time after boot. The second is a 20 minute view, 10 idle minutes after boot, an evm:stop followed by 10 more idle minutes.
Jason Frey
@Fryguy
Oct 14 2015 20:48
@dmetzger57 That sounds like a leak to me and it has nothing to do with refresh
interesting that the first graph == the first half of the second graph
Joe Rafaniello
@jrafanie
Oct 14 2015 20:49
@tenderlove I had different results when I ran it twice... for the same test scenario
Aaron Patterson
@tenderlove
Oct 14 2015 20:49
:(
Joe Rafaniello
@jrafanie
Oct 14 2015 20:52
see if you get consistent results if you run it a few times
Jason Frey
@Fryguy
Oct 14 2015 20:55
@dmetzger57 The leak seems kind of slow, but you said you eventually ran out of memory?
Dennis Metzger
@dmetzger57
Oct 14 2015 20:56
i ran out of patience when the system got down to a few hundred Mb, so i can't say if it gets to 0 or magically rebounds
Joe Rafaniello
@jrafanie
Oct 14 2015 22:21
@tenderlove we're tracking down an schedule worker leak and produced another dump_all that had a single malformed json line at the end of the file, just like saw on tuesday
Aaron Patterson
@tenderlove
Oct 14 2015 22:22
fun
Joe Rafaniello
@jrafanie
Oct 14 2015 22:22
is that a known thing or one that only happens on my machine?
Aaron Patterson
@tenderlove
Oct 14 2015 23:04
jrafanie: I've never seen it happen before
jrafanie: I think the dump might be getting too large
is what I'm guessing
but...
ugh