These are chat archives for ManageIQ/manageiq/performance

15th
Jan 2016
Alex Krzos
@akrzos
Jan 15 2016 14:40
@jrafanie / @dmetzger57 With regards to the replication worker I fixed my test and there doesn't appear to be memory growth.
Issue was correct uninstalling of replication (due to same region name) on the master db
Dennis Metzger
@dmetzger57
Jan 15 2016 14:40
that’s good news
Alex Krzos
@akrzos
Jan 15 2016 14:40
This is what the memory usage of a replication worker looks like for ~ 2hours
MiqReplicationWorker-59878.png
so if you have errors with replication the worker can grow until he is recycled
Joe Rafaniello
@jrafanie
Jan 15 2016 14:41
good to know @akrzos
Alex Krzos
@akrzos
Jan 15 2016 14:49
Also so reloading the apache configuration recycles the httpd processes
so something in my workload is most likely forcing apache to reload it's configuration
not a tremendous problem, but tracking httpd shows a boatload of recycled httpd processes
Alex Krzos
@akrzos
Jan 15 2016 15:20
Is there an expected sub process to the replication worker? ruby /opt/rh/cfme-gemset/bin/rake evm:dbsync:replicate
Jason Frey
@Fryguy
Jan 15 2016 15:20
yes
that one
Alex Krzos
@akrzos
Jan 15 2016 15:20
ok, looks like it doubles the replication worker memory usage
Alex Krzos
@akrzos
Jan 15 2016 15:30
By double I mean it doesn't raise the actual worker's memory usage, just enabling replication means we pay memory for the worker + that replicate process ( and according to my measurements ~150-170MiB for the worker and another 170MiB for the replication process)
Jason Frey
@Fryguy
Jan 15 2016 15:43
yeah, you end up with "double"
the worker itself is really dumb...just responsible for monitoring the sub-process and heartbeating
I think that one will benefit GREATLY from forking workers
Keenan Brock
@kbrock
Jan 15 2016 15:44
+1
Joe Rafaniello
@jrafanie
Jan 15 2016 15:44
@akrzos what will you do when forked workers solves world peace?
:laughing:
Keenan Brock
@kbrock
Jan 15 2016 15:44
@jrafanie duh! go to disneyworld!
Alex Krzos
@akrzos
Jan 15 2016 15:45
@jrafanie give my lab a break
Joe Rafaniello
@jrafanie
Jan 15 2016 15:47
lol
Alex Krzos
@akrzos
Jan 15 2016 15:49
So question is should I bz the memory penalty of replication so we can track it with forked worker (world peace solution)
Joe Rafaniello
@jrafanie
Jan 15 2016 15:51
actually, I haven't touched the replication worker's spawn process yet, so we'll have to do that with fork too if it makes sense
Alex Krzos
@akrzos
Jan 15 2016 16:01
understood, but I assume we would want to bz it so it gets visibility and that it's understood
Jason Frey
@Fryguy
Jan 15 2016 16:04
you guys kill me with your matching shirts :)
I keep mixin up who's who
Alex Krzos
@akrzos
Jan 15 2016 16:05
lol
akrzos @akrzos :heart: orange
Joe Rafaniello
@jrafanie
Jan 15 2016 16:17
@Fryguy Sorry that's we not hiding who we are behind a video game character :laughing:
Jason Frey
@Fryguy
Jan 15 2016 16:18
haha
not just any video game character
I only wish more places would allow animated avatars
Fryguy-48x48-Anim.gif
2 frame animation FTW
Daniel Berger
@djberg96
Jan 15 2016 16:42
Might be time to switch to my BGG avatar
Daniel Berger
@djberg96
Jan 15 2016 16:48
@Fryguy how did you post that?
Jason Frey
@Fryguy
Jan 15 2016 16:48
drag and drop into the app?
from a local file
Daniel Berger
@djberg96
Jan 15 2016 16:49
bgg_avatar.gif
Joe Rafaniello
@jrafanie
Jan 15 2016 16:50
lol
Jason Frey
@Fryguy
Jan 15 2016 16:50
hahaha...I think I've seen you post that before
oh, also, that's your twitter avatar...hilarious
Daniel Berger
@djberg96
Jan 15 2016 16:51
:)
Chris Arcand
@chrisarcand
Jan 15 2016 17:41
Wait whut. I did not know that.
gazelle-nomnoms.gif
Nice.
Chris Arcand
@chrisarcand
Jan 15 2016 17:44
happy-cat.gif
Keenan Brock
@kbrock
Jan 15 2016 18:18
has anyone used rack-speedtracer ? it looks very slick, but out of date. confused why rack-mini-profiler doesn't just use the chrome plugin and developed it's own ui.
thought speedtracer was merged into chrome dev tools, but todd irish looks to have a recent github repo with it
ooh - you can't download it anymore. huh
Joe Rafaniello
@jrafanie
Jan 15 2016 19:53
ping @akrzos, I want to bump the schedule worker and base worker memory (replication) from 250 and 200 to 300, is that good for you? I don't think either value gave workers much cushion to do some work pre-generational GC...
did you have a bug associated with teh schedule worker one?
Alex Krzos
@akrzos
Jan 15 2016 20:03
@jrafanie I don't have a bz opened with the replication worker yet. For the schedule worker I have: https://bugzilla.redhat.com/show_bug.cgi?id=1296192
Joe Rafaniello
@jrafanie
Jan 15 2016 20:04
ok, thanks @akrzos
I'm debating just making my forking workers PR use PSS instead so we can reconfigure the thresholds based on that
Jason Frey
@Fryguy
Jan 15 2016 20:06
I'm cool with that if you think it's required
Alex Krzos
@akrzos
Jan 15 2016 20:07
I am torn between best options for memory threshold honestly. The absolute best solution is to reduce overall memory usage. I fear bumping the threshold will allow an appliance to spawn many workers and hit swap
Keenan Brock
@kbrock
Jan 15 2016 20:07
something fragged my local ruby version - so I'm going to start running 2.2.4
Alex Krzos
@akrzos
Jan 15 2016 20:07
a higher threshold means harder to hit the rss thresholds we currently have
and given that a customer or user could configure enough roles or add enough providers to cause the appliance to swap and in which case the threshold could be invalid (No per process swap measurement yet right?)
Ultimately the best use of the threshold being raised is to prevent the saw tooth memory usage, I would say would be our goal
with that in mind I'd say 250-300 should be the base threshold for 5.5.2.0
might want to get tomh's opinion as well
Jason Frey
@Fryguy
Jan 15 2016 20:11
@akrzos I don't think LJ wants to keep it high forever
just temporarily while we introduce forking workers
once that's in, we can move things to preloading in the server, and the numbers should start coming down quickly
Alex Krzos
@akrzos
Jan 15 2016 20:12
@Fryguy understood, whats the timeline for forking workers?
Jason Frey
@Fryguy
Jan 15 2016 20:12
If @jrafanie can fix the last issue he found, I'd merged today
unfortunately, it results in like a +30MB per process right now, which puts a few over the threshhold
that will be fixed in follow up PRs
Joe Rafaniello
@jrafanie
Jan 15 2016 20:14
actually, schedule worker is not getting restarted due to memory usage with fork now
only thing that changed was @matthewd suggestion to close the parent process's PG socket fd in the forked child
Jason Frey
@Fryguy
Jan 15 2016 20:15
is that pushed?
Joe Rafaniello
@jrafanie
Jan 15 2016 20:16
and if I switch to the threshold validation to use PSS, I have enough room to start them without changing the thresholds
@Fryguy yeah, I pushed to my PR
just doing the "use PSS if available for the threshold validation" now
Jason Frey
@Fryguy
Jan 15 2016 20:17
I still think PSS is weird, but I guess it's the best number we got
yeah
Joe Rafaniello
@jrafanie
Jan 15 2016 20:17
yeah
Jason Frey
@Fryguy
Jan 15 2016 20:17
I'm for it...probably better than changing the threshhold
Joe Rafaniello
@jrafanie
Jan 15 2016 20:17
that's the negative of fork, you can't really tell what's being used
will probably add smem rpm to the upstream appliance so you can watch pss locally
Jason Frey
@Fryguy
Jan 15 2016 20:18
you can't get the "this memory is only from me" separate from the "this memory is shared" amounts?
Alex Krzos
@akrzos
Jan 15 2016 20:19
So the replication worker was also part of the debate. I have yet to be able to push the replication worker to a point where it recycles. The initial issues I ran into with the replication worker was with it incorrectly configured. The current issue I have with the replication worker is enabling replication you pay for the memory of the worker + a replication process which I have seen above 200MiB RSS, so combined replication has taken ~370ish MiB RSS Memory
Joe Rafaniello
@jrafanie
Jan 15 2016 20:19
you can get private, I didn't do it in sys-proctable but it's simple
Alex Krzos
@akrzos
Jan 15 2016 20:21
I was not aware of the separate replication process until today and just added it to my memory measurement code so I will get better numbers by next week. (RDU lab is going down for at least 12 hours starting tonight.) pending I can kick my jobs to run tomorrow we will have some solid data on its memory usage monday.
pss is probably the best number we can use for now until someone comes up with something better
Joe Rafaniello
@jrafanie
Jan 15 2016 20:37
@Fryguy testing pss checking now: jrafanie/manageiq@7fccdd3
Jason Frey
@Fryguy
Jan 15 2016 20:38
Nice...after that, what's left?
Joe Rafaniello
@jrafanie
Jan 15 2016 20:38
send an email/talk discussion
maybe add smem rpm to appliance-build repo
Jason Frey
@Fryguy
Jan 15 2016 20:39
ok, I want to coordinate merging of ManageIQ/manageiq#6086 as well
@akrzos Can you comment in there if you are good with the wording?
Joe Rafaniello
@jrafanie
Jan 15 2016 20:40
yeah, i'll need to change the MiqProcess.is_worker? method to match whatever happens there
or maybe I can ask that in the PR
Jason Frey
@Fryguy
Jan 15 2016 20:40
though that one is the server process only now
Joe Rafaniello
@jrafanie
Jan 15 2016 20:41
oh ok
Alex Krzos
@akrzos
Jan 15 2016 20:41
Will that change the name in systemd? Will that still be evmserverd?
Jason Frey
@Fryguy
Jan 15 2016 20:43
I believe the service name stays the same
@carbonin ?
because we set the process title inside the process
Matthew Draper
@matthewd
Jan 15 2016 20:44
Talking of forks, I need to blow the dust off ManageIQ/manageiq#4211 too, though that's just [narrowly focused] perf, no memory impact
Nick Carboni
@carbonin
Jan 15 2016 20:44
Ah nice that they allow me to join when I'm mentioned
Jason Frey
@Fryguy
Jan 15 2016 20:44
and I don't think systemd looks at what it sees in top in order to manage it
Alex Krzos
@akrzos
Jan 15 2016 20:44
Well actually it should have a long and short name right
the short name will probably still be ruby
akrzos @akrzos checks one of his appliances
Jason Frey
@Fryguy
Jan 15 2016 20:44
not sure how that works
Nick Carboni
@carbonin
Jan 15 2016 20:45
The systemd unit name is determined by the unit file name
Alex Krzos
@akrzos
Jan 15 2016 20:45
yeah the short name for "/var/www/miq/vmdb/lib/workers/bin/evm_server.rb" is ruby
Nick Carboni
@carbonin
Jan 15 2016 20:45
Was that the question?
Alex Krzos
@akrzos
Jan 15 2016 20:46
well from a making it easier for the end user perspective, would we want the name in top to be that much different than the "daemon" name in systemd?
Jason Frey
@Fryguy
Jan 15 2016 20:46
haha...so ManageIQ/manageiq#6086 changes the proctitle of the server process
@akrzos just wants to make sure that systemd won't have a fit from that
Nick Carboni
@carbonin
Jan 15 2016 20:47
Nah, it uses cgroups to track the forked processes. The name of the process itself doesn't matter
Alex Krzos
@akrzos
Jan 15 2016 20:49
So I'd like a shorter name than "ManageIQ Server process" I think "MiqServer" would be good enough, and heck I'd like the systemd daemon name to match, but thats just a opinion, not based on any sort of hard reason other than that makes it easy for an end user to identify what it is and how to manage it.
matthewd @matthewd likes the style & patterns of the PG process naming
Matthew Draper
@matthewd
Jan 15 2016 20:51
It manages to be an informative non-path string, yet still feels unixy, and not out of place in ps output
Oleg Barenboim
@chessbyte
Jan 15 2016 20:55
I see nothing wrong with using the model names MiqServer, MiqGenericWorker, etc to make work easier for @akrzos
Jason Frey
@Fryguy
Jan 15 2016 21:05
sounds good to me. @akrzos can you comment in that thread?
Alex Krzos
@akrzos
Jan 15 2016 21:23
@Fryguy commenting