These are chat archives for ManageIQ/manageiq/performance

28th
Oct 2015
Jason Frey
@Fryguy
Oct 28 2015 14:10
@/all I want to talk about the performance gains we made in the past sprint but I need some numbers
@jrafanie @dmetzger57 Do you have numbers offhand?
Joe Rafaniello
@jrafanie
Oct 28 2015 14:12
hmm, we'd have to dig up @dmetzger57's master tests initially vs. now
Jason Frey
@Fryguy
Oct 28 2015 14:13
I know my PRs were 200MB savings in the refresh worker and about 200-300 MB savings in the Broker
but I don't know the denominator :)
Joe Rafaniello
@jrafanie
Oct 28 2015 14:13
there's also this: ManageIQ/manageiq-appliance#36
Jason Frey
@Fryguy
Oct 28 2015 14:14
oh yeah
Oleg Barenboim
@chessbyte
Oct 28 2015 14:14
saw @dmetzger57 in the Mahwah office, but cannot find him now
Joe Rafaniello
@jrafanie
Oct 28 2015 14:15
i wish gitter had a way to list attachments... I don't see it
Jason Frey
@Fryguy
Oct 28 2015 14:15
:point_up: October 27, 2015 4:48 PM This?
Joe Rafaniello
@jrafanie
Oct 28 2015 14:15
that's vs. 5.4
Jason Frey
@Fryguy
Oct 28 2015 14:15
:point_up: October 27, 2015 12:17 PM More?
Joe Rafaniello
@jrafanie
Oct 28 2015 14:16
yeah, we'd need to look at initial 5.5 vs. now
Jason Frey
@Fryguy
Oct 28 2015 14:17
The slide says "compared to Botvinnik", so I'm ok with the 5.4 comparison
it's all rough anyway to give the community an general idea
Depends whether it's about "what we did this sprint — now, compared to botvinnik it's… ", or "FYI, compared to botvinnik ..."
Jason Frey
@Fryguy
Oct 28 2015 14:18
true...I'm not sure we ever talked about performance in previous sprint reviews though
Matthew Draper
@matthewd
Oct 28 2015 14:18
I was assuming it was the former, so we'd want a 3-way comparison of botvinnik, beginning of sprint, end of sprint
Jason Frey
@Fryguy
Oct 28 2015 14:19
it's a little late :sweat_smile:
Matthew Draper
@matthewd
Oct 28 2015 14:19
Yeah :)
Jason Frey
@Fryguy
Oct 28 2015 14:20
so what are the numbers I should speak to?
Joe Rafaniello
@jrafanie
Oct 28 2015 14:20
yes, that would be ideal for a few environments
Jason Frey
@Fryguy
Oct 28 2015 14:20
ETOOMUCHINFORMATION :)
I was thinking of just giving overalls numbers with respect to https://files.gitter.im/ManageIQ/manageiq/performance/r6xP/Large-Env-Sizes.png
and just allude to the fact that in between the numbers were much higher and they've been brought back down closer to botvinnik levels
Dennis Metzger
@dmetzger57
Oct 28 2015 14:22
@Fryguy looking to see if I have my old master numbers handy
Joe Rafaniello
@jrafanie
Oct 28 2015 14:22
I would assume we want to classify the types of memory reductions: all processes (GC settings, columnar mime-types), specific processes (broker, vmware refresh worker), systemwide (no more automate worker)
any more detail and it's too much
Keenan Brock
@kbrock
Oct 28 2015 14:24
I had thought the numbers were lower after we were done. they are a tad higher?
Jason Frey
@Fryguy
Oct 28 2015 14:24
higher than 5.4
much lower than the number from master after we upgraded to Ruby 2.2
Matthew Draper
@matthewd
Oct 28 2015 14:24
The spreadsheet I linked to seems to contain the relevant numbers… but I don't think it has the matching summary info
Jason Frey
@Fryguy
Oct 28 2015 14:24
@jrafanie That's more or less what I'm going to say
I just want to give some overall weight to the statements...e.g. memory was improved by X%
Dennis Metzger
@dmetzger57
Oct 28 2015 14:25
So I ended up aborting all my initial Master runs because with 10Gb of RAM one the appliance it swapped so much that I gave up after letting the provider add refresh run for 75 minutes.
Matthew Draper
@matthewd
Oct 28 2015 14:26
@kbrock I think we've ended up lower on the after-refresh number for vmware small, but still a bit higher on large
Dennis Metzger
@dmetzger57
Oct 28 2015 14:26
@matthewd that is correct
Joe Rafaniello
@jrafanie
Oct 28 2015 14:26
It's hard to give numbers because it's based on the size of the environment... it's important to mention how much faster expensvie operations are with relatively similar(slightly increased) memory usage
Jason Frey
@Fryguy
Oct 28 2015 14:27
yeah, the beneifts of generational garbage collection
Matthew Draper
@matthewd
Oct 28 2015 14:28
I think we can skip real numbers and just say something like "was up to a 50% increase, and we've now got it down to a more manageable 10%"
Jason Frey
@Fryguy
Oct 28 2015 14:28
yeah, that's what I was thinking
Matthew Draper
@matthewd
Oct 28 2015 14:29
(I'm pretty sure that 50% is in the right ball park — not just a placeholder)
Jason Frey
@Fryguy
Oct 28 2015 14:29
yeah, it might have even been higher
Joe Rafaniello
@jrafanie
Oct 28 2015 14:29
Do we have @akrzos's numbers as to how long a large or x-large refresh took on vmware on 5.4 vs. master? I believe it never finished before and now it finishes reasonably
either way, too much information... general information is enough
Jason Frey
@Fryguy
Oct 28 2015 14:30
I did not put columnar mime types on the slide
is that ok? or were there heavy benefits there?
(ran out of room on the slide :sweat_smile: )
Matthew Draper
@matthewd
Oct 28 2015 14:30
I think it's fine… there were a number of large wins and a number of smaller ones ¯\_(ツ)_/¯
Dennis Metzger
@dmetzger57
Oct 28 2015 14:31
I saw 80% size increase and those runs never completed because I could not add enough ran to the appliance
Joe Rafaniello
@jrafanie
Oct 28 2015 14:31
that's fine, details, it's in the "reduced memory on all rails processes" bucket
Alex Krzos
@akrzos
Oct 28 2015 14:31

@jrafanie From my benchmarks:

RSS Memory change (End of Benchmark RSS Utilization - After Start of Ruby Console) (Maximum of 4 runs)
5.4.2.0 - 607MiB
5.5.0.1 - 1248MiB
5.5.0.3 - 1152MiB
5.5.0.7 - 901MiB

Timing (99th percentile of 4 runs):
5.4.2.0 - 719s
5.5.0.1 - 462s
5.5.0.3 - 382s
5.5.0.7 - 274s

Master should be better than 5.5.0.7
Joe Rafaniello
@jrafanie
Oct 28 2015 14:32
wow, @akrzos, thanks
Jason Frey
@Fryguy
Oct 28 2015 14:32
@akrzos Is that a specific worker or the whole appliance?
Alex Krzos
@akrzos
Oct 28 2015 14:32
I'd run my benchmarks against master right now however changes to the ui again :frowning:
Jason Frey
@Fryguy
Oct 28 2015 14:33
:)
Alex Krzos
@akrzos
Oct 28 2015 14:33
@Fryguy Thats through rails console
Jason Frey
@Fryguy
Oct 28 2015 14:33
k thanks
Alex Krzos
@akrzos
Oct 28 2015 14:33
I'm developing a whole appliance benchmark too, just going to take some time to get it right
Jason Frey
@Fryguy
Oct 28 2015 14:34
is that for large?
Alex Krzos
@akrzos
Oct 28 2015 14:34
Also note that between 5.5.0.1 and 5.5.0.3 I did change the benchmark code to include GC.start before/after benchmark
so perhaps the before GC.start could have added some memory savings that showed up
More impressive from a timing perspective would be the xlarge refresh environment
@dmetzger57 Saved us more than 90m IIRC
Jason Frey
@Fryguy
Oct 28 2015 14:36
minutes?! o_O
Alex Krzos
@akrzos
Oct 28 2015 14:36
Yes
Jason Frey
@Fryguy
Oct 28 2015 14:37
wow
Joe Rafaniello
@jrafanie
Oct 28 2015 14:37
x-large is how many vms? large is 3000, right?
Jason Frey
@Fryguy
Oct 28 2015 14:37
large is 3000
Joe Rafaniello
@jrafanie
Oct 28 2015 14:37
thanks for the confirmation ;-)
Alex Krzos
@akrzos
Oct 28 2015 14:40
x-large is 10,000 vms
Jason Frey
@Fryguy
Oct 28 2015 15:35
HARDER BETTER FASTER STRONGER :D
Alex Krzos
@akrzos
Oct 28 2015 17:12
@dmetzger57 I ran through the same scenario as you have pretty much with master and I can see some good changes, though still heavier on memory compared to 5.4
I did the test with vmware-medium environment
If I were to use vmware-large, we know the collectors probably can not keep up with the number of vms to collect on unless collection timing has decreased
So I'm seeing VIMBroker is about the same as 5.4 now
actually 1.9MiB less on master than 5.4
refresh core worker has reduced to only 42MiB more than 5.4. that's 51MiB less than 5.5.0.1
Oleg Barenboim
@chessbyte
Oct 28 2015 17:16
@akrzos sounds like significant progress
Alex Krzos
@akrzos
Oct 28 2015 17:18
@chessbyte Yes with this test I can confirm some nice improvements with master over 5.5.0.1 which was the last results I have for this same style of test. Still not parity with 5.4 but getting closer
Jason Frey
@Fryguy
Oct 28 2015 17:19
speed should be much better though despite (or perhaps because of) the higher memory
that's the 2.2 tradeoff
Alex Krzos
@akrzos
Oct 28 2015 17:22
agreed, definitely seen the speed improvements, still need to run against master once I have working ui automation again, qe is fixing their automation for the changes in upstream so hopefully i can get that rolling against master soon.
@dmetzger57 Are you sure you turned on C&U in your test results with the Vmware large env, I see much more collector growth than you have
~74MiB over 5.4
Alex Krzos
@akrzos
Oct 28 2015 17:28
I am also seeing about 20MiB more for the processors
most of the results seem to match, through also gotta remember I attached to the medium env and you did the large env
also the ui worker and webservice worker are not on the chart
i have around 40MiB more for the ui worker on master over 5.4
for the "same" browsing I did to setup each env
Dennis Metzger
@dmetzger57
Oct 28 2015 17:31
@akrzos yep I'm sure I enabled C&U, granted I only left it run for 5 minutes after enabling it
Alex Krzos
@akrzos
Oct 28 2015 17:31
@dmetzger57 That might be it, I left it running a bit longer
also I am on the master (10/26 build) did not git pull anything so something could have changed
Dennis Metzger
@dmetzger57
Oct 28 2015 17:32
@akrzos yea, the next big thing to look at is memory consumption over time, I think we have a slow leak, perhaps a dribble
Alex Krzos
@akrzos
Oct 28 2015 17:33
@dmetzger57 gotcha, that will suck to debug