These are chat archives for ManageIQ/manageiq/performance

8th
Jan 2016
Keenan Brock
@kbrock
Jan 08 2016 12:42
@akrzos I can confirm that I broke the thresholds in november
but I can't confirm whether it worked before that. still looking into that ==> 1296638
Alex Krzos
@akrzos
Jan 08 2016 14:05
@kbrock Thanks for looking into it
Dennis Metzger
@dmetzger57
Jan 08 2016 14:24
@akrzos how’s your 24 hour test doing
Alex Krzos
@akrzos
Jan 08 2016 14:29
@dmetzger57 Still running, and unfortunatly I do not have a view into the results as it is running other than sshing into the box and gleening information from logs about recycled workers but I also setup some 8 hour tests as well and have two of those completed
and they aren't looking good in comparsion to 5.5.0.13
5.5.0.13-appliance_memory.png
5.5.2.0-appliance_memory.png
5.5.0.13-2 is the top
5.5.2.0 is the bottom
managing a medium vmware provider
Dennis Metzger
@dmetzger57
Jan 08 2016 14:33
glad you’re testing is catching this, on a positive note
Alex Krzos
@akrzos
Jan 08 2016 14:34
Thanks
the 5.5.2.0 recycled a generic worker which on 5.5.0.13 did not occur
the generic worker hit a threshold above 400MiB
Joe Rafaniello
@jrafanie
Jan 08 2016 14:35
@akrzos were you able to try 5.5.0.13 upgraded to 7.2?
Alex Krzos
@akrzos
Jan 08 2016 14:35
got that appliance cooking now
Joe Rafaniello
@jrafanie
Jan 08 2016 14:35
great
Alex Krzos
@akrzos
Jan 08 2016 14:35
just upgraded to rhel 7.2
going to kick off the 40-45minute tests against it
Joe Rafaniello
@jrafanie
Jan 08 2016 14:36
if we have the exact same rpm list or very close as your 5.5.2.0 testing, then it's only the ruby environment at play
we know that the spockets change brought in concurrent-ruby and we had a few other gems changes that will bump memory
@dmetzger57 I'm still looking at the forking workers, not really actively looking at the worker memory growth yet, cc @gtanzillo
@djberg96 I should have a PR on sys-proctable shortly for your review to add PSS, USS, swap... going to see if the RSS number I get for the process is the one it currently retrieves
Dennis Metzger
@dmetzger57
Jan 08 2016 14:40
@jrafanie thats a good priority order, I’m going to take a look at the growth (and/or) ways to automate catching growth
Daniel Berger
@djberg96
Jan 08 2016 14:40
@jrafanie alrighty, will keep a lookout
Joe Rafaniello
@jrafanie
Jan 08 2016 14:41
@djberg96 it looks like a pure ruby gem on linux... is the compilation on linux a no-op or skipped entirely?
I'm wondering if the linux RPM will be a noarch or x86_64
does anyone what smaps even means? google can't find anything... maps is clearly mappings... no idea what the s prefix means
@djberg96 I fail at rdoc, will need your help to get one of the methods doc'd
Daniel Berger
@djberg96
Jan 08 2016 14:45
Don't worry about it
yes, pure ruby on linux
there's no compilation
only OSX and HP-UX are still using C
Joe Rafaniello
@jrafanie
Jan 08 2016 14:46
ok
Daniel Berger
@djberg96
Jan 08 2016 14:46
(and if anyone wants to volunteer to finish converting OSX to FFI, please do!)
Joe Rafaniello
@jrafanie
Jan 08 2016 14:47
;-)
I'll fix a typo I saw elsewhere, how about that?
:laughing:
Daniel Berger
@djberg96
Jan 08 2016 14:48
hah, sure
Matthew Draper
@matthewd
Jan 08 2016 14:49
I'd guess s = space, as in address space… but that's just a guess.
Joe Rafaniello
@jrafanie
Jan 08 2016 14:50
thanks @matthewd that's better than my guess... the final letter in process ;-)
Matthew Draper
@matthewd
Jan 08 2016 14:51
hah, that's even worse than my backup guess of share/sharing :P
Joe Rafaniello
@jrafanie
Jan 08 2016 14:52
nice
Daniel Berger
@djberg96
Jan 08 2016 14:58
map series? map scan?
Dennis Metzger
@dmetzger57
Jan 08 2016 15:03
i think the ’s’ may be ‘segment’, segment maps. old memory management terminology
Matthew Draper
@matthewd
Jan 08 2016 15:04
Looks like it's size
smaps Extension based on maps, presenting the rss size for each mapped file
When it doubt, find the commit that added it :grinning:
Joe Rafaniello
@jrafanie
Jan 08 2016 15:05
lol
thanks
my other guess was set
resident set size, proportional set size, etc.
guess what: naming is hard
Joe Rafaniello
@jrafanie
Jan 08 2016 15:08
ah, it's probably show_map
Matthew Draper
@matthewd
Jan 08 2016 15:10
No, that implements /maps
The difference between /maps and /smaps is that the latter includes the size values
Joe Rafaniello
@jrafanie
Jan 08 2016 15:12
Ok, smaps: "Process memory size for each mapped file"
better?
Matthew Draper
@matthewd
Jan 08 2016 15:14
Sounds plausible
Joe Rafaniello
@jrafanie
Jan 08 2016 15:15
:tada:
Daniel Berger
@djberg96
Jan 08 2016 15:17
smaps, they're not just for breakfast any more
Alex Krzos
@akrzos
Jan 08 2016 15:20
@kbrock / @dmetzger57 did you get a chance to look at this one? https://bugzilla.redhat.com/show_bug.cgi?id=1227008
I'm hitting it bad in my newly deployed 5.5 scale
Keenan Brock
@kbrock
Jan 08 2016 15:21
@akrzos it is in my top 5 - I'll bump to #2 (after the setting worker memory threshold bug)
Alex Krzos
@akrzos
Jan 08 2016 15:22
I'll continue to see what might be causing it on my 5.5 scale
Keenan Brock
@kbrock
Jan 08 2016 16:15
@akrzos I want this for our metrics :)
I know. I know. I know - slightly different - but think about it :)
Alex Krzos
@akrzos
Jan 08 2016 16:18
Interesting
akrzos @akrzos face palms
Joe Rafaniello
@jrafanie
Jan 08 2016 17:07
@djberg96 looks like rss already in sys-proctable is wrong, opened the PR: djberg96/sys-proctable#49
I'm going to try it on rhel 7
rhel 6, the already existing rss is way different than what top or smaps reports
Daniel Berger
@djberg96
Jan 08 2016 17:08
wrong? hm
as in my code is wrong, or the kernel info is giving bogus info?
Joe Rafaniello
@jrafanie
Jan 08 2016 17:09
we're either using the value from /proc/PID/stat or the kernel is putting bad data there
I'm getting 2352 kb in top, 2352000 in bytes in smaps, and 588 in using the existing rss method
Daniel Berger
@djberg96
Jan 08 2016 17:10
oh, i think i read that it's not "wrong" per se, just not what most people want
anyhoo, i'll take a look
Joe Rafaniello
@jrafanie
Jan 08 2016 17:11
Note, my smaps.vss matches the existing vsize
so stat matches there but the rss things are either measuring different things, we're using the wrong field from /stat or the value is wrong for that kernel
or I'm losing it :confused:
anyway, feel free to take a look and let me know what you think
Daniel Berger
@djberg96
Jan 08 2016 17:28
rss doesn't include swapped out memory it looks like
Joe Rafaniello
@jrafanie
Jan 08 2016 17:28
Right
that's what smaps rss is measuring too
Alex Krzos
@akrzos
Jan 08 2016 17:42
5.5.2.0 vs 5.5.0.13-2 R7.1,R7.2.png
@jrafanie Thats my C&U scenario for ~40minutes, 10s samples of memory usage managing the small vmware environment
Joe Rafaniello
@jrafanie
Jan 08 2016 17:44
Ok, so 7.2 is only very slightly higher memory
so it's all ruby / gems / code on our side
Alex Krzos
@akrzos
Jan 08 2016 17:45
Around 100MiB at the end
Joe Rafaniello
@jrafanie
Jan 08 2016 17:45
@akrzos are the rpm lists identical between the two 7.2 setups?
Alex Krzos
@akrzos
Jan 08 2016 17:46
I'm unsure
let me run a diff
I would believe there is probably a few more updated on the 5.5.0.13-2 R7.2
Joe Rafaniello
@jrafanie
Jan 08 2016 17:47
ok
Alex Krzos
@akrzos
Jan 08 2016 17:49
Alex Krzos
@akrzos
Jan 08 2016 18:36
@Fryguy You had a patch for 5.4 to change proctitles too right?
Jason Frey
@Fryguy
Jan 08 2016 18:38
i did not
but we can put one together if you want it
@jrafanie were you going to cherry-pick to 5.5?
or was I supposed to do that?
Alex Krzos
@akrzos
Jan 08 2016 18:40
Was curious just to make analyzing my scale lab of 5.4 a bit easier to pick apart per worker metrics
Jason Frey
@Fryguy
Jan 08 2016 18:40
yup
Alex Krzos
@akrzos
Jan 08 2016 18:50
@jrafanie Also I wanted to point out on that graph, I see about ~350MiB more used on 5.5.2.0 during the portion all three appliances are idling (without a provider), this occurs for 2 minutes during the first "flat line" area of the memory chart, after 2 minutes, the provider is added and refreshed and it waits 10minutes for the refresh to complete. It completes the refresh and 5.5.2.0 ends up about 360-370MiB higher than the original 5.5.0.13-2 appliance
Keenan Brock
@kbrock
Jan 08 2016 18:51
@Fryguy / @jrafanie blocked by something dumb: I don't know if the place where we are reading :memory_threshold is actually working.
I remember being in that code, but it didn't seem testable. and I keep loosing it. can't remember where it is.
Joe Rafaniello
@jrafanie
Jan 08 2016 18:51
@Fryguy I don't believe the cherry-pick would be clean since it was a move... I haven't been looking at that, been trying to get forking workers + pss measurements working

for a complete list here: https://gist.github.com/akrzos/e17a5cc944ffe6962d3c

@akrzos yeah, not too many rpms that are different

Jason Frey
@Fryguy
Jan 08 2016 18:52
@kbrock Is the test passing?
Keenan Brock
@kbrock
Jan 08 2016 18:53
this test does
it didn't before
I fixed a bug
BUT
that is not what is actually called to get the real value
I want to make sure it is placed in the right place and read from the same place
Jason Frey
@Fryguy
Jan 08 2016 18:53
oh i see it's a new test on your branch
Keenan Brock
@kbrock
Jan 08 2016 18:53
+1
Jason Frey
@Fryguy
Jan 08 2016 18:54

I want to make sure it is placed in the right place and read from the same place

yes

that is not what is actually called to get the real value

I don't understand that

does that mean the caller doesn't call set_worker_setting! ?
Keenan Brock
@kbrock
Jan 08 2016 18:54
the code that answers the question "what is my threshold" is not tested. and I don't know that it is using the same key path/array
caller doesn't call get_worker_setting
we do a hierarchical loop kind of thing
aah - coming back to me. think I found it
@fryguy - found it --> miq_worker.rb I can not say for sure that the two of those work together
Joe Rafaniello
@jrafanie
Jan 08 2016 22:27
@djberg96 thanks for merging the smaps / PSS PR... I'll test against the master branch with MIQ master and my forking workers branch for comparison and let you know if it I find any bugs... maybe you can then cut a new release on sys-proctable, sound good?
Daniel Berger
@djberg96
Jan 08 2016 22:28
sounds good
Alex Krzos
@akrzos
Jan 08 2016 22:28
5.5.2.0-appliance_memory.png
25 hour test with the small provider
The schedule worker did not recycle
Jason Frey
@Fryguy
Jan 08 2016 22:30
I'm confused about ManageIQ/manageiq#6086
Isn't that already there (aside from the addition of evm_server)
Joe Rafaniello
@jrafanie
Jan 08 2016 22:31
yeah
Jason Frey
@Fryguy
Jan 08 2016 22:32
yeah