These are chat archives for ManageIQ/manageiq/performance

6th
Dec 2017
Nick LaMuro
@NickLaMuro
Dec 06 2017 00:07
I don't agree with everything in here, but man... I thought I was snarky: https://www.kernel.org/doc/Documentation/process/coding-style.rst
However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely.
Keenan Brock
@kbrock
Dec 06 2017 03:54
lol
Keenan Brock
@kbrock
Dec 06 2017 16:03
if you need more than 3 levels of indentation, you're screwed anyway

Get a decent editor and don't leave whitespace at the end of lines

wow - snark-o-meter is hight with this one

Adam Grare
@agrare
Dec 06 2017 16:14
Linus' snark level is legendary
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:17

So, here are some findings regarding the appliance I am working on:

After removing MiqServer#populate_queue_messages from the MiqServer#monitor block, we still have a leak:

20171204_5923.png
By contrast, this is what it looked like without commenting out that line
Jason Frey
@Fryguy
Dec 06 2017 16:17
oh good idea to whittle it away
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:17
20171121_9523.png
Note: The change in angle of the growth is most likely do to the date range difference
Jason Frey
@Fryguy
Dec 06 2017 16:18
so it's the same, or not? (You wrote "by contrast" which implies there's a difference)
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:19
I will have to check with the logs directly, but maybe "by contrast" was a poor choice of words
basically, all I can say is the only for sure difference between the two is the amount of time run, and the changes to the codebase
the first graph is basically we are in a state where the rest of the workers are doing nothing, since we aren't passing messages over DRB
yet we still seem to be leaking
I also am unsure about the spike at the beginning of the first graph
Jason Frey
@Fryguy
Dec 06 2017 16:22
it's a good approach though...do you think you can comment it line-by-line?
Joe Rafaniello
@jrafanie
Dec 06 2017 16:27
yeah, it's worth eliminating as many things as possible
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:28

do you think you can comment it line-by-line?

I think so, but I have some ideas at what it might be just from this test:

  1. What I introduced with the initializer: gist
  2. One of the other methods in in MiqServer#monitor
  3. Possibly Benchmark.realtime_block (would explain why other workers are also leaking slowly)?
I am leaning towards either removing the initializer (to see if the heisenberg principle is at play), or whittling things down (number 2)
Jason Frey
@Fryguy
Dec 06 2017 16:31
Removing Benchmarking should be simple
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:31
3 is there just because it was something that would also explain leaks in other workers, and it is the only thing that I noticed that might be related in both (probably going to wait on messing with this, just putting it out there as food for thought)
@Fryguy yeah, easy, just whatever we choose, we have to wait about 12-24 hours to see the affects
Jason Frey
@Fryguy
Dec 06 2017 16:32
ahhhh
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:33
though, if @jrafanie is satisfied with the logs that we have currently that the heap isn't growing in ruby on that appliance, removing 1 might be the best first step
those graphs are generated from top_output, so we already have something that can measure the affects of RSS/VIRT/etc.
Jason Frey
@Fryguy
Dec 06 2017 16:37
ah right....removing 3 would imply a Ruby leak, which we don't think it is
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:39
yeah, unsure where it is, but if we get down to heartbeat being the last method not commented, then that might be a possible culprit
Jason Frey
@Fryguy
Dec 06 2017 16:39
comment out the entirety of monitor_poll ;)
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:40
heh
you mean #monitor right?
Jason Frey
@Fryguy
Dec 06 2017 16:40
yeah
Joe Rafaniello
@jrafanie
Dec 06 2017 16:41
For more data points...
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:41
#monitor_poll would make it go quicker though, but that is probably not another variable we want to deal with
Joe Rafaniello
@jrafanie
Dec 06 2017 16:41

So, I'm not sure what's different but @NickLaMuro's appliance seems to be leaking a little faster than @dmetzger57's...

For Dennis's appliance:

[----] I, [2017-12-06T03:27:11.137418 #5248:b9d138]  INFO -- : MIQ(MiqServer#monitor_poll) Memory Info XXXX => {RSS: 368912000, PSS: 289559000, thread_count: 24, gc_stats: {:count=>2267, :heap_allocated_pages=>4284, :heap_sorted_length=>4285, :heap_allocatable_pages=>0, :heap_available_slots=>1746159, :heap_live_slots=>1745436, :heap_free_slots=>723, :heap_final_slots=>0, :heap_marked_slots=>1194162, :heap_swept_slots=>81064, :heap_eden_pages=>4284, :heap_tomb_pages=>0, :total_allocated_pages=>4284, :total_freed_pages=>0, :total_allocated_objects=>1291070308, :total_freed_objects=>1289324872, :malloc_increase_bytes=>838968, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>2219, :major_gc_count=>48, :remembered_wb_unprotected_objects=>60850, :remembered_wb_unprotected_objects_limit=>114182, :old_objects=>1090297, :old_objects_limit=>2047722, :oldmalloc_increase_bytes=>4565680, :oldmalloc_increase_bytes_limit=>16777216}, objspace_count: {:TOTAL=>1746159, :FREE=>670, :T_OBJECT=>197732, :T_CLASS=>18495, :T_MODULE=>2530, :T_FLOAT=>12, :T_STRING=>610593, :T_REGEXP=>2363, :T_ARRAY=>274330, :T_HASH=>78847, :T_STRUCT=>14544, :T_BIGNUM=>363, :T_FILE=>205, :T_DATA=>110419, :T_MATCH=>43827, :T_COMPLEX=>1, :T_RATIONAL=>3904, :T_SYMBOL=>4626, :T_IMEMO=>376098, :T_ICLASS=>6600}}
[----] I, [2017-12-06T11:21:57.119583 #5248:b9d138]  INFO -- : MIQ(MiqServer#monitor_poll) Memory Info XXXX => {RSS: 414292000, PSS: 334940000, thread_count: 24, gc_stats: {:count=>3379, :heap_allocated_pages=>4284, :heap_sorted_length=>4285, :heap_allocatable_pages=>0, :heap_available_slots=>1746159, :heap_live_slots=>1745482, :heap_free_slots=>677, :heap_final_slots=>0, :heap_marked_slots=>1209781, :heap_swept_slots=>311048, :heap_eden_pages=>4284, :heap_tomb_pages=>0, :total_allocated_pages=>4284, :total_freed_pages=>0, :total_allocated_objects=>1927089623, :total_freed_objects=>1925344141, :malloc_increase_bytes=>779184, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>3310, :major_gc_count=>69, :remembered_wb_unprotected_objects=>62474, :remembered_wb_unprotected_objects_limit=>114206, :old_objects=>1105086, :old_objects_limit=>2048304, :oldmalloc_increase_bytes=>5276384, :oldmalloc_increase_bytes_limit=>16777216}, objspace_count: {:TOTAL=>1746159, :FREE=>624, :T_OBJECT=>205910, :T_CLASS=>18516, :T_MODULE=>2530, :T_FLOAT=>12, :T_STRING=>610680, :T_REGEXP=>2363, :T_ARRAY=>269067, :T_HASH=>78825, :T_STRUCT=>14477, :T_BIGNUM=>351, :T_FILE=>208, :T_DATA=>109289, :T_MATCH=>43797, :T_COMPLEX=>1, :T_RATIONAL=>3351, :T_SYMBOL=>4587, :T_IMEMO=>374971, :T_ICLASS=>6600}}

~8 hours
RSS +45 MB
PSS +45 MB
thread count: no change
heap_sorted_length: no change
heap_live_slots: +46 slots (basically nothing)

For NickL's appliance:

[----] I, [2017-12-06T03:22:12.923382 #5923:129713c]  INFO -- : MIQ(MiqServer#monitor_poll) Memory Info XXXX => {RSS: 741952000, PSS: 585824000, thread_count: 30, gc_stats: {:count=>10480, :heap_allocated_pages=>4700, :heap_sorted_length=>5099, :heap_allocatable_pages=>0, :heap_available_slots=>1915728, :heap_live_slots=>1530577, :heap_free_slots=>385151, :heap_final_slots=>0, :heap_marked_slots=>1319595, :heap_swept_slots=>286346, :heap_eden_pages=>4474, :heap_tomb_pages=>226, :total_allocated_pages=>4700, :total_freed_pages=>0, :total_allocated_objects=>3063834210, :total_freed_objects=>3062303633, :malloc_increase_bytes=>10484176, :malloc_increase_bytes_limit=>33554432, :minor_gc_count=>9943, :major_gc_count=>537, :remembered_wb_unprotected_objects=>54739, :remembered_wb_unprotected_objects_limit=>109460, :old_objects=>1222317, :old_objects_limit=>2444412, :oldmalloc_increase_bytes=>35144208, :oldmalloc_increase_bytes_limit=>70727545}, objspace_count: {:TOTAL=>1915728, :FREE=>385098, :T_OBJECT=>159315, :T_CLASS=>19163, :T_MODULE=>2631, :T_FLOAT=>12, :T_STRING=>559164, :T_REGEXP=>2458, :T_ARRAY=>243886, :T_HASH=>73062, :T_STRUCT=>13526, :T_BIGNUM=>136, :T_FILE=>128, :T_DATA=>92059, :T_MATCH=>17742, :T_COMPLEX=>1, :T_RATIONAL=>2634, :T_SYMBOL=>3726, :T_IMEMO=>333875, :T_ICLASS=>7112}}
[----] I, [2017-12-06T11:34:12.820998 #5923:129713c]  INFO -- : MIQ(MiqServer#monitor_poll) Memory Info XXXX => {RSS: 796092000, PSS: 614641000, thread_count: 30, gc_stats: {:count=>13111, :heap_allocated_pages=>4700, :heap_sorted_length=>5099, :heap_allocatable_pages=>0, :heap_available_slots=>1915728, :heap_live_slots=>1744613, :heap_free_slots=>171115, :heap_final_slots=>0, :heap_marked_slots=>1325536, :heap_swept_slots=>319202, :heap_eden_pages=>4490, :heap_tomb_pages=>210, :total_allocated_pages=>4700, :total_freed_pages=>0, :total_allocated_objects=>3829146031, :total_freed_objects=>3827401418, :malloc_increase_bytes=>12700016, :malloc_increase_bytes_limit=>33554432, :minor_gc_count=>12420, :major_gc_count=>691, :remembered_wb_unprotected_objects=>55046, :remembered_wb_unprotected_objects_limit=>109346, :old_objects=>1227807, :old_objects_limit=>2442046, :oldmalloc_increase_bytes=>72646944, :oldmalloc_increase_bytes_limit=>74343475}, objspace_count: {:TOTAL=>1915728, :FREE=>171062, :T_OBJECT=>165015, :T_CLASS=>19175, :T_MODULE=>2631, :T_FLOAT=>12, :T_STRING=>666079, :T_REGEXP=>2458, :T_ARRAY=>262742, :T_HASH=>81650, :T_STRUCT=>13914, :T_BIGNUM=>267, :T_FILE=>192, :T_DATA=>97860, :T_MATCH=>34969, :T_COMPLEX=>1, :T_RATIONAL=>2816, :T_SYMBOL=>3726, :T_IMEMO=>384047, :T_ICLASS=>7112}}

~8 hours
RSS: +54 MB
PSS: +28 MB
thread count: no change
heap_sorted_length: no change
heap_live_slots: +214,036 slots (+14%)

Nick LaMuro
@NickLaMuro
Dec 06 2017 16:42
@dmetzger57 's appliance might be an idle workload, where mine has 6 C&U workers
3 Processors, and 3 Collectors
or maybe it is just 2 (one of each)
let me check
Joe Rafaniello
@jrafanie
Dec 06 2017 16:43
But anyway... they both grew in spite of the "inside" ruby heap not growing
Jason Frey
@Fryguy
Dec 06 2017 16:44
@jrafanie Is this only reproducible on an appliance?
Joe Rafaniello
@jrafanie
Dec 06 2017 16:44
so far, yes
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:46
alright, I am going to delete that initializer and restart the appliance if no one has any objections
Dennis Metzger
@dmetzger57
Dec 06 2017 16:47
@NickLaMuro if it’s the appliance I pointed @jrafanie to yesterday it is an idle workload, no providers under management
Joe Rafaniello
@jrafanie
Dec 06 2017 16:47
yeah, that's the one I was looking at @dmetzger57
Keenan Brock
@kbrock
Dec 06 2017 16:49
@NickLaMuro do you need the ObjectSpace portion?
(as you know) I am always wary of that stuff
Joe Rafaniello
@jrafanie
Dec 06 2017 16:50
A little birdy suggested that we can clone your vm @NickLaMuro, and test various amounts of commenting out of the server process code
Brandon Dunne
@bdunne
Dec 06 2017 16:51
:wink:
Nick LaMuro
@NickLaMuro
Dec 06 2017 16:51
if that birdy rhymes with "Menace", then sure, I am confident that makes sense
or maybe it rhymes with... okay, I got nothing if it was @bdunne (I just assumed it was @dmetzger57 )
Joe Rafaniello
@jrafanie
Dec 06 2017 17:04
Yeah, nothing funny rhymes with Brandon. Believe me, I tried.
Dennis Metzger
@dmetzger57
Dec 06 2017 17:06
“Dennis the Menace” haven’t heard that one in decades
Jason Frey
@Fryguy
Dec 06 2017 17:09
Dunne is ripe for puns though
Keenan Brock
@kbrock
Dec 06 2017 17:09
random kinda rhymes with brandon
Jason Frey
@Fryguy
Dec 06 2017 17:09
and on that note, I'm dunne here... :running:
Keenan Brock
@kbrock
Dec 06 2017 17:09
heh
Dennis Metzger
@dmetzger57
Dec 06 2017 17:09
I almost spit up my coffee, thanks @Fryguy
Joe Rafaniello
@jrafanie
Dec 06 2017 17:09
yeah, leave @Fryguy
:-1:
Fryguy @Fryguy exits the room :running:
NickLaMuro @NickLaMuro thought he heard the theme to "Law and Order" just now...
Nick LaMuro
@NickLaMuro
Dec 06 2017 17:17
Dunne Dunne
Jason Frey
@Fryguy
Dec 06 2017 17:17
HAHA
Nick LaMuro
@NickLaMuro
Dec 06 2017 17:17
:musical_note:
I think he might have found his theme music for when he has just reviewed a PR