These are chat archives for ManageIQ/manageiq/performance

14th
Dec 2017
Nick LaMuro
@NickLaMuro
Dec 14 2017 00:12
welp... looks like I was testing all of my loop scripts incorrectly, since I was just ctrl-z'ing them, and not backgrounding them (with bg) after I did, so leaving them in a suspended state...
NickLaMuro @NickLaMuro feels like a moron...
Joe Rafaniello
@jrafanie
Dec 14 2017 00:13
Been there. Step away and come back with a clearer mind. @NickLaMuro
Nick LaMuro
@NickLaMuro
Dec 14 2017 00:20
so something I noticed when monitoring the smaps is that at the beginning of the process, there are zero AnonHugePages, but on both my local vagrant VM, and on an appliance, after about 10 min or so, half of the mem gets converted to AnonHugePages... or at least reported as such
Keenan Brock
@kbrock
Dec 14 2017 00:22
@NickLaMuro memory pages go from parent to child?
Nick LaMuro
@NickLaMuro
Dec 14 2017 00:23
@kbrock
[vagrant@localhost vmdb]$ date; sudo wc -l /proc/4680/smaps; sudo grep -A 10 "\[heap\]" /proc/4680/smaps; echo; grep VmRSS /proc/4680/status | cut -d " " -f3,4
Sat Dec  9 05:34:25 EST 2017
5920 /proc/4680/smaps
0202d000-0d862000 rw-p 00000000 00:00 0                                  [heap]
Size:             188628 kB
Rss:              188412 kB
Pss:              188412 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    188412 kB
Referenced:       188412 kB
Anonymous:        188412 kB
AnonHugePages:         0 kB

197348 kB
[vagrant@localhost vmdb]$ date; sudo wc -l /proc/4680/smaps; sudo grep -A 10 "\[heap\]" /proc/4680/smaps; echo; grep VmRSS /proc/4680/status | cut -d " " -f3,4
Sat Dec  9 05:35:56 EST 2017
5920 /proc/4680/smaps
0202d000-0d862000 rw-p 00000000 00:00 0                                  [heap]
Size:             188628 kB
Rss:              188444 kB
Pss:              188444 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    188444 kB
Referenced:       188444 kB
Anonymous:        188444 kB
AnonHugePages:     12288 kB

197348 kB
[vagrant@localhost vmdb]$ date; sudo wc -l /proc/4680/smaps; sudo grep -A 10 "\[heap\]" /proc/4680/smaps; echo; grep VmRSS /proc/4680/status | cut -d " " -f3,4
Sat Dec  9 05:37:34 EST 2017
5920 /proc/4680/smaps
0202d000-0d862000 rw-p 00000000 00:00 0                                  [heap]
Size:             188628 kB
Rss:              188504 kB
Pss:              188504 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    188504 kB
Referenced:       188504 kB
Anonymous:        188504 kB
AnonHugePages:    108544 kB

197404 kB
(probably should have gist'd that...)
Keenan Brock
@kbrock
Dec 14 2017 00:24
as a process gets older, it becomes less COW / fewer pages with parent and child
Nick LaMuro
@NickLaMuro
Dec 14 2017 00:24
this is just a script
no forking
Keenan Brock
@kbrock
Dec 14 2017 00:24
wonder if at that time, the pages for the child process get ugly
ooooh
thanks
that answers that
Nick LaMuro
@NickLaMuro
Dec 14 2017 00:25
specifically, this is the script:
puts "PID: #{Process.pid}"

require 'config/environment.rb'

MiqWorker.find_current # just autoload now to get it out of the way

puts "starting loop..."
loop do
  MiqWorker.status_update_all
  (GC.start; do_gc = false)   if do_gc  && Time.now.min % 2 == 0
  do_gc = true                if !do_gc && Time.now.min % 2 != 0
  sleep 0.5
end
basically continually running MiqWorker.status_update_all continually, and GC'ing periodically
Keenan Brock
@kbrock
Dec 14 2017 00:46
showing a leak?
Nick LaMuro
@NickLaMuro
Dec 14 2017 01:35
unsure yet
but I did step away as LJ suggested, so haven't checked back in over an hour
Keenan Brock
@kbrock
Dec 14 2017 02:12
lol
so you took a vacation today ;)
you know, a whole hour
Joe Rafaniello
@jrafanie
Dec 14 2017 03:12
@NickLaMuro yeah, that's the same thing in finding in regards to the anon huge pages and what got me looking at disabling transparent huge pages. @dmetzger57 testing with disabling it didn't look great yet but I'd like to see what smaps looks like after more than 24 hours.
I'm finding
If we're asking for a page and are getting huge pages after a few hours of warming up the server process, it's unlikely we're creating that much churn to partially fill huge pages. That's my thinking.
Joe Rafaniello
@jrafanie
Dec 14 2017 03:18
Or even worse, if the memory manager is moving our existing os pages to huge pages dynamically and not compacting them on the way in, there had to be lots of wasted space
Has
Joe Rafaniello
@jrafanie
Dec 14 2017 03:38
For our server process, it's possible CoW faults could be causing copies to new huge pages from regular pages.
Keenan Brock
@kbrock
Dec 14 2017 14:34
my money is on CoW faults
the cow is at fault
Joe Rafaniello
@jrafanie
Dec 14 2017 14:36
@NickLaMuro nice graphs by the way (from yesterday afternoon, didn't hit "enter")
Dennis Metzger
@dmetzger57
Dec 14 2017 14:37
@jrafanie “transpearant" & “never" setting for hugepage both “leak” in the Idle workload test, about 100Mb in the first 12 hours - this is consistent with what I saw without tweaking this param
mem_usage_Miq_Server_13631.png
mem_usage_Miq_Server_2144.png
Joe Rafaniello
@jrafanie
Dec 14 2017 14:38
Same appliances @dmetzger57? I'd like to look at what smaps says
Dennis Metzger
@dmetzger57
Dec 14 2017 14:38
2 new appliances I spun up for the test, I’ll PM the IPs
Joe Rafaniello
@jrafanie
Dec 14 2017 18:07
@dmetzger57 @NickLaMuro is it surprising to you that we have so many non-drb threads in the server process process (11?)
11 workers and 11 drb clients and 1 drb server threads...
# ps -ef |grep -E "MIQ.+Worker" |wc -l
11

# grep "Name" /proc/2144/task/*/status | grep drb
/proc/2144/task/2554/status:Name: drb.rb:1498
/proc/2144/task/2603/status:Name: drb.rb:1645
/proc/2144/task/2607/status:Name: drb.rb:1645
/proc/2144/task/2608/status:Name: drb.rb:1645
/proc/2144/task/2612/status:Name: drb.rb:1645
/proc/2144/task/2620/status:Name: drb.rb:1645
/proc/2144/task/2696/status:Name: drb.rb:1645
/proc/2144/task/2697/status:Name: drb.rb:1645
/proc/2144/task/2699/status:Name: drb.rb:1645
/proc/2144/task/2705/status:Name: drb.rb:1645
/proc/2144/task/2708/status:Name: drb.rb:1645
/proc/2144/task/2715/status:Name: drb.rb:1645

# grep "Name" /proc/2144/task/*/status | grep drb | wc -l
12
12 other "ruby" threads
# grep "Name" /proc/2144/task/*/status | grep -E "ruby$"
/proc/2144/task/2144/status:Name: ruby
/proc/2144/task/2559/status:Name: ruby
/proc/2144/task/2564/status:Name: ruby
/proc/2144/task/2577/status:Name: ruby
/proc/2144/task/2584/status:Name: ruby
/proc/2144/task/2595/status:Name: ruby
/proc/2144/task/2639/status:Name: ruby
/proc/2144/task/2648/status:Name: ruby
/proc/2144/task/2656/status:Name: ruby
/proc/2144/task/2666/status:Name: ruby
/proc/2144/task/2675/status:Name: ruby
/proc/2144/task/2685/status:Name: ruby

# grep "Name" /proc/2144/task/*/status | grep -E "ruby$" |wc -l
12
I tried to get a gdb ruby backtrace but killed my local server process two times in a row
I'll keep trying to get a list of all thread backtraces
Dennis Metzger
@dmetzger57
Dec 14 2017 18:11
@jrafanie that’s a lot of other threads. @NickLaMuro is on PTO today / tomorrow.
Joe Rafaniello
@jrafanie
Dec 14 2017 18:17
ah, cool, sorry for the ping @NickLaMuro ;-)
Joe Rafaniello
@jrafanie
Dec 14 2017 18:29
I'll try doing a few less workers to see if I can get the ruby backtrace from all threads without killing my server process. Don't worry @dmetzger57 I'm not doing this on your appliances, yet...
Dennis Metzger
@dmetzger57
Dec 14 2017 18:30
fairly warned :smile: