These are chat archives for ManageIQ/manageiq/performance

24th
May 2016
Daniel Berger
@djberg96
May 24 2016 05:42
pfft, Madonna did it first ;)
Joe Rafaniello
@jrafanie
May 24 2016 14:15
@Fryguy I have a theory RE: increased memory usage and was wondering if you had ideas on how to test the theory
Given, all objects in the ruby heap are blindly placed in the OS heap based on where space is availble
Given, all objects have different lifetimes
Given, you need to free or malloc to a location on the OS heap all the time, all neighboring objects in the OS heap will need to be copied on both a free or malloc...
Keenan Brock
@kbrock
May 24 2016 14:20
so you're basically saying "Don't have a COW man"
Joe Rafaniello
@jrafanie
May 24 2016 14:21
So, previously, we spawned a new process that only went through rails boot + our boot + that worker's startup code
Now, we fork from the server process (which was always populating it's heap with all of it's startup code) to create workers
the heap of workers at start now != heap of workers with spawn
So, now, if server startup code, let's say all of that seeding code... is in the forked worker's heap and neighboring locations on an OS page get a free/malloc, you now have private memory in the worker heap that was never there before
Dennis Metzger
@dmetzger57
May 24 2016 14:25
@jrafanie did forking workers start in 5.5.2.4-2, cause we see an initial ~30MiB bump across workers (compared to 5.5.0.13-2)
Joe Rafaniello
@jrafanie
May 24 2016 14:26
no, forking workers wasn't added to that release
so, it's 5.5 vs. 5.6
@dmetzger57 So, yes, a comparison of idle workers of 5.5.0 vs. 5.5.2 vs. 5.6.x by worker/server would be helpful
Joe Rafaniello
@jrafanie
May 24 2016 14:31
What I'm theorizing about is that eager loading code pre-fork when ruby is not CoW friendly can cause code that never existed on the heap of a "spawned worker" to be copied from shared memory from the server to private memory of the forked worker
I haven't measured this, it's only a theory
we're using nakayoshi_fork to help limit the CoW-unfriendliness of the age calculation but we still have the issue of OS heap objects needing to be copied on write to also force the same for neighboring objects on the heap
Keenan Brock
@kbrock
May 24 2016 14:36
this feels like mad COW disease...
Jason Frey
@Fryguy
May 24 2016 14:36
I'm not sure I follow...you'll have to show me with a picture :)
Alex Krzos
@akrzos
May 24 2016 14:37
@jrafanie I sent you links to the memory graphs of both versions idle with each worker graphed at 10s intervals
Joe Rafaniello
@jrafanie
May 24 2016 14:37
thanks
Keenan Brock
@kbrock
May 24 2016 14:37
@Fryguy does that picture make it more clear?
Jason Frey
@Fryguy
May 24 2016 14:37
LOL
Keenan Brock
@kbrock
May 24 2016 14:38
aah :boot: :clock1: ==> @jrafanie
@Fryguy sorry... a gripe/complaint... but please remind me why we need :total_time from an rbac query
Jason Frey
@Fryguy
May 24 2016 14:40
¯\_(ツ)_/¯
Keenan Brock
@kbrock
May 24 2016 14:40
real question - we display somewhere in the ui?
Joe Rafaniello
@jrafanie
May 24 2016 14:40
git spelunking time
Jason Frey
@Fryguy
May 24 2016 14:41
what is total_time anyway? time to execute the query?
Keenan Brock
@kbrock
May 24 2016 14:41
oops
total_count
helps if I ask the actual question :(
Jason Frey
@Fryguy
May 24 2016 14:41
haha
Dennis Metzger
@dmetzger57
May 24 2016 14:41
the bump in usage between 5.5.0.13-2 and 5.5.2.4.2 has my attention, don’t see a good reason for growth there
Keenan Brock
@kbrock
May 24 2016 14:41
auth_count = # to display for page numbering and stuff - that makes sense
Jason Frey
@Fryguy
May 24 2016 14:41
so there are 2 numbers auth_count and total_count...I have no idea why total_count was ever needed
Keenan Brock
@kbrock
May 24 2016 14:41
that is a ui room question I suppose - thanks
Jason Frey
@Fryguy
May 24 2016 14:41
right auth_count is used in the page counting widget
I think the only purpose of total_count is when no records are returned the message shown is different
that is, when auth_count == 0 and total_count != 0 we show "No authorized records" vs if both are 0, then we show "No records found"....or something like that
Joe Rafaniello
@jrafanie
May 24 2016 14:43
@Fryguy summary: EvmDatabase.seed is run in server both before(spawn) and after(fork) so any code run here is in the heap of the server. This code is never in the worker heap in the before(spawn) mode. In forked workers, it does exist as shared memory with the server process. Because it exists in the worker heap, there's opportunity for neighboring objects to cause it to be copied on write in the worker process as private memory.
Jason Frey
@Fryguy
May 24 2016 14:43
those are not the exact phrases, but you get the idea...I think it's pointless, TBH
Keenan Brock
@kbrock
May 24 2016 14:43
joe
can you test with seeding turned off?
does it make much of a difference?
Jason Frey
@Fryguy
May 24 2016 14:44
@jrafanie But with spawned processes you get complete copies vs with forking you get partial copies
even if you happen to copy all of the memory, at most you should be at the same memory usage as spawned workers
Joe Rafaniello
@jrafanie
May 24 2016 14:44
@kbrock seeding is just an example of code run by the server that's never run by the worker
Keenan Brock
@kbrock
May 24 2016 14:44
thanks
are you sure it is not run by the worker?
Joe Rafaniello
@jrafanie
May 24 2016 14:45
@kbrock It doesn't matter. There is definitely code loaded in the server that is never loaded by the worker
Jason Frey
@Fryguy
May 24 2016 14:45
oh i see...you are saying that evm_server.rb code would never have been loaded by the worker, and thus it's copied overhead?
Joe Rafaniello
@jrafanie
May 24 2016 14:45
but is now because fork is a duplicate of the heap
Jason Frey
@Fryguy
May 24 2016 14:45
interesting...how much memory are we talking about though?
like, how big is evm_server.rb
Joe Rafaniello
@jrafanie
May 24 2016 14:46
I don't know... but think about things like server monitor code... worker monitor code...
Jason Frey
@Fryguy
May 24 2016 14:46
hmmm
Joe Rafaniello
@jrafanie
May 24 2016 14:47
It's just a theory. I don't know how to really test it
Jason Frey
@Fryguy
May 24 2016 14:47
actually, do we have worse problems where the workers are running the worker monitoring loops as well?
when you fork, do you get all the threads as well, or just the memory (/me is always confused by fork)
Joe Rafaniello
@jrafanie
May 24 2016 14:48
I would assume all Thread objects are in the heap since they exist in ObjectSpace
So, I'm not really pointing out specific things that are in the heap that never were before... but that we definitely do because the worker vs. server startup is different
Dennis Metzger
@dmetzger57
May 24 2016 14:49
only the thread running the fork gets cloned
Jason Frey
@Fryguy
May 24 2016 14:49
right...thanks @dmetzger57 (too early for my brain :P )
Dennis Metzger
@dmetzger57
May 24 2016 14:49
more caffine
so I’m seeing two issues / topics: the possibility of an increased usage because of forking and why 5.5.0 to 5.5.2 saw an increase - I’m focusing on the 2nd
Jason Frey
@Fryguy
May 24 2016 14:51
5.5.0 to 5.5.2...that's weird
Dennis Metzger
@dmetzger57
May 24 2016 14:51
agreed
Joe Rafaniello
@jrafanie
May 24 2016 14:51
yes, can we have separate bugs on very specific changes like that?
that should be much easier to track down
@akrzos So, your spreadsheet https://docs.google.com/spreadsheets/d/1CWYKPJBMQHsTNnk8n1wM9yQvDy7v6sJ57mhc_zKdzEI/edit#gid=616139042, it's read-only... it's geared at total appliance view. Can we have a comparison of Generic Workers on all of those versions?
ReportingWorker, etc.
Alex Krzos
@akrzos
May 24 2016 14:53
just switched it to edit
Joe Rafaniello
@jrafanie
May 24 2016 14:53
thanks
Alex Krzos
@akrzos
May 24 2016 14:54
comparsion tab
and per ruby process rss and pss i think is what your looking for
Keenan Brock
@kbrock
May 24 2016 14:55
thanks @Fryguy - I just heard you say "you say it is a performance hit. why not test it and see how much..." (not sure why I'm hearing voices but anyway)
Jason Frey
@Fryguy
May 24 2016 14:56
sort of :) I was thinking it's a performance hit for a pointless "feature"
in fact, it's worse when you get to tenancy, because two tenants should (possibly) have no concept of each other
Joe Rafaniello
@jrafanie
May 24 2016 14:57

comparsion tab
and per ruby process rss and pss i think is what your looking for

Thanks @akrzos for showing me HOW TO SPREADSHEET

Oleg Barenboim
@chessbyte
May 24 2016 14:57
@Fryguy @kbrock are you guys discussing the RBAC total count or another feature?
Keenan Brock
@kbrock
May 24 2016 14:57
@chessbyte yes, RBAC :total_count
right now, I have a bug in my refactoring
Oleg Barenboim
@chessbyte
May 24 2016 14:57
I thought you moved that discussion to UI room
Alex Krzos
@akrzos
May 24 2016 14:58
no worries, sorry if you saw that already didn't mean to point it out if you did already
Keenan Brock
@kbrock
May 24 2016 14:58
ooh, I included the thanks for performance in this room - but yes
Alex Krzos
@akrzos
May 24 2016 14:58
I tend to make too many tabs
Keenan Brock
@kbrock
May 24 2016 14:58
thanks @chessbyte
Joe Rafaniello
@jrafanie
May 24 2016 14:59

no worries, sorry if you saw that already didn't mean to point it out if you did already

Nope, I went through a few tabs, not that one and didn't see it

thanks
Jason Frey
@Fryguy
May 24 2016 14:59
oh LOL...I saw the thanks from @kbrock in here and thought it was the UI room :)
Oleg Barenboim
@chessbyte
May 24 2016 14:59
@Fryguy rub your eyes :-)
Alex Krzos
@akrzos
May 24 2016 15:00
ok cool
Jason Frey
@Fryguy
May 24 2016 15:00
haha seriously
Joe Rafaniello
@jrafanie
May 24 2016 15:02
@akrzos is your PSS value coming from smem?
Alex Krzos
@akrzos
May 24 2016 15:04
@jrafanie Yes
Chris Arcand
@chrisarcand
May 24 2016 15:07
it's worse when you get to tenancy.
^ Added a period for you, @Fryguy
Jason Frey
@Fryguy
May 24 2016 15:08
hahaha
Joe Rafaniello
@jrafanie
May 24 2016 15:11
Thanks @akrzos, from your spreadsheets, my theory maybe be true but doesn't appear to have much impact on us
5.6.0.7 PSS by worker looks alot like like 5.5.4.1
Alex Krzos
@akrzos
May 24 2016 15:13
Yeah I'm putting together a spreadsheet on C&U compared over many versions with what should be mostly the same workload to quantify how much the memory has changed with a workload as well
then i'd like to do the same for provisioning/ssa as well
Joe Rafaniello
@jrafanie
May 24 2016 15:16
Ok, yeah, by worker would be great in case we changed the number and types of workers that start by default
Oleg Barenboim
@chessbyte
May 24 2016 15:17
@dmetzger57 who is looking at https://bugzilla.redhat.com/show_bug.cgi?id=1296695 ? is that what @jrafanie is investigating here?
Joe Rafaniello
@jrafanie
May 24 2016 15:17
@akrzos I'd be curious if 5.6 with singular workers (1 where we set 2) could do the same amount of work in the same time as 5.4.x with defaults of 2 workers by type
(but that's not for today)
@chessbyte yeah, that's the general problem I'm trying to get specific about
Jason Frey
@Fryguy
May 24 2016 15:18
@jrafanie But are you looking at the 5.5.0-5.5.2 memory bump or the 5.6 memory bump?
(or do you think they are interrelated)
Joe Rafaniello
@jrafanie
May 24 2016 15:21
I"m looking at 5.5.0-5.5.2 now... I don't see a 5.6 bump at all
Dennis Metzger
@dmetzger57
May 24 2016 15:22
@chessbyte yes @jrafanie is looking at increased usage as am I
Oleg Barenboim
@chessbyte
May 24 2016 15:22
@dmetzger57 thanks
Alex Krzos
@akrzos
May 24 2016 15:29
I think part of the 5.6 memory bump is in reality an additional worker (websocketsworker)
I turn him off on the idle baselines since we never had that worker in older releases
at least in my latest baselines
I marked which baselines had that worker on/off
Joe Rafaniello
@jrafanie
May 24 2016 15:31
@akrzos do you have an appliance we can get the Gemfile.lock for your graphs of 5.5.2.0 and 5.5.0.13-2?
Alex Krzos
@akrzos
May 24 2016 15:31
let me check
I might have deleted those but I could redeploy a new one pretty quickly (less than 5minutes)
yeah they are deleted
let me redeploy those two builds so we can get the gemlock file on the ones i baselined
Joe Rafaniello
@jrafanie
May 24 2016 15:33
thanks!
we'd like to inspect OS versions, package differences too
the Gemfile changes betwen them is minimal
Alex Krzos
@akrzos
May 24 2016 15:34
ok deploying them now just a straight deploy vm from template, no vmdb init-ed or anything special
Dennis Metzger
@dmetzger57
May 24 2016 15:34
for visual comparison, the following chart shows the average (5 runs each) PSS for workers between 5.5.0 and 5.5.2 after adding a medum VMware provider
Joe Rafaniello
@jrafanie
May 24 2016 15:34
FYI, I believe 5.4 to 5.5 is ruby 2.0 to 2.2 so I'd expect a memory jump
Dennis Metzger
@dmetzger57
May 24 2016 15:34
PSS Comparison.jpg
Alex Krzos
@akrzos
May 24 2016 15:35
agreed, but the memory jump between the two is pretty large
Joe Rafaniello
@jrafanie
May 24 2016 15:35
it's probably more beneficial to look at jumps with 5.5+
Jason Frey
@Fryguy
May 24 2016 15:35
For the Ruby bump from Ruby 2.0 to Ruby 2.2 they changed the garbage collector...so I'd expect a memory bump...they traded off memory for performance
Joe Rafaniello
@jrafanie
May 24 2016 15:36
yeah, that's why I say, we should have made workers 1 when moving to ruby 2.2 and only increase the number for workers we were not able to keep up with
we basically traded cpu for memory but never decreased the number of processes so yes, it will always cost memory
Keenan Brock
@kbrock
May 24 2016 15:36
@Fryguy I need to revisit a number of virtual attributes with sql. do you want me to change those from virtual_attribute to virtual_column?
Alex Krzos
@akrzos
May 24 2016 15:36
agreed for our ability to react to memory changes but any customer migrating would probably benefit from knowing how much more memory they need for 5.5
Joe Rafaniello
@jrafanie
May 24 2016 15:36
@akrzos Yes, I agree
Jason Frey
@Fryguy
May 24 2016 15:37
that is, the GC is generational, and only runs young objects more frequently...thus, old objects stick around longer (more memory), but it is done so as not to have to traverse object the tree as frequently (faster)
Joe Rafaniello
@jrafanie
May 24 2016 15:37
I'm talking from identifying reasons for memory jumps, comparing 5.4 to 5.5 is really hard to pinpoint
Jason Frey
@Fryguy
May 24 2016 15:37
yeah, I think looking at 5.4 to 5.5 is an exercise in futility because of the Ruby version bump
Joe Rafaniello
@jrafanie
May 24 2016 15:38
Although, if you pinpoint specific items of work that grow a worker much more from 5.4. to 5.5, then yes, we can tackle that
Jason Frey
@Fryguy
May 24 2016 15:38
@kbrock I'm not sure....to be honest, I don't understand the new virtual attribute stuff welll enough to know the trade offs
@akrzos Can we get 5.5.1 numbers as well?
Dennis Metzger
@dmetzger57
May 24 2016 15:39
i’m running off to a dental chair - i’m betting on us dropping worker counts to 1 (after we test is performance is satisfactory with that config, which I believe it will be for “POC” usage model) for the 5.6 release
Jason Frey
@Fryguy
May 24 2016 15:39
(not sure how complicated it is to get these numbers)
Alex Krzos
@akrzos
May 24 2016 15:40
I don't have a 5.5.1 template
not sure why I don't, if I missed it or if there never was one
Jason Frey
@Fryguy
May 24 2016 15:40
hmmm
Dennis Metzger
@dmetzger57
May 24 2016 15:42
I don’t believe there really is a 5.5.1, there was some magic about that, not really a code change
Jason Frey
@Fryguy
May 24 2016 15:42
ohhhh riiiight
that silly release
right I just checked and 5.5.1.0 has the same tag as 5.5.0.13
"text only errata" or whatever :)
Alex Krzos
@akrzos
May 24 2016 15:47
5.5.2.0 - gem "sprockets-rails", "< 3.0.0" in Gemfile
Jason Frey
@Fryguy
May 24 2016 15:48
yeah, but that's a restriction (we always had sprockets-rails)
Alex Krzos
@akrzos
May 24 2016 15:48
yeah thats the only difference in the gemfile
Jason Frey
@Fryguy
May 24 2016 15:48
yup...also linux_admin and ovirt gems, but I looked at the changes and they are minor
Alex Krzos
@akrzos
May 24 2016 15:48
I messaged @jrafanie the appliance ips if you want them I can send them to you too @Fryguy
Jason Frey
@Fryguy
May 24 2016 15:49
I am pairing with jrafanie, so I'm good :)
Joe Rafaniello
@jrafanie
May 24 2016 15:50
FYI, 5.5.0.13 -> 5.5.2.0... (RHEL 7.1 -> 7.2)
both are ruby 2.2p95, rubygems 2.4.5
glibc 2.17-78 vs. 2.17-106
Jason Frey
@Fryguy
May 24 2016 15:52
glibc might be important because of the MALLOC_ARENA_MAX number we play with
Joe Rafaniello
@jrafanie
May 24 2016 15:53
kernel: 3.10.0-229.20.1 -> 3.10.0-327.3.1
Alex Krzos
@akrzos
May 24 2016 15:54
thats wild
cat /etc/redhat-release
so rhel7.2 we went back in kernel versions
Jason Frey
@Fryguy
May 24 2016 15:54
wai-wha?
Joe Rafaniello
@jrafanie
May 24 2016 15:55
229 -> 327
Alex Krzos
@akrzos
May 24 2016 15:55
phew
missed that
Jason Frey
@Fryguy
May 24 2016 15:55
oh ok haha