These are chat archives for ManageIQ/manageiq/performance

13th
Jan 2016
Alex Krzos
@akrzos
Jan 13 2016 13:51
@jrafanie So I started collecting data on replication worker memory usage and yup for 5.5.2.0 200MiB is too low of a threshold
Joe Rafaniello
@jrafanie
Jan 13 2016 14:06
ok @akrzos, so are we confident in blindly increasing the 200 and 250 MB thresholds to 300 or even 325 in while specific BZs are opened to investigate the memory increase for them?
Alex Krzos
@akrzos
Jan 13 2016 14:07
I'm not sure what a good threshold is yet
Here's what I gathered
Here is the life of a single replication worker during an 1hr 22m test with 2 vmware providers
MiqReplicationWorker-33374.png
Now get this, here is all of them:
MiqReplicationWorker-all.png
I truncated the legend on that one
so it was occurring rapidly during the hour or so
Also bad news for the schedule worker
I have one scenario where it recycled several times over the course of 2 hours
(Separate issue)
So I can bump the replication worker to something much higher so it doesn't recycle and re-run that test, and we can see where it levels off
Joe Rafaniello
@jrafanie
Jan 13 2016 14:19
yeah, can you bump both and re-run the test, we need to know what's it's stable memory usage is?
and @akrzos we just merged PSS logging and storage in the miq_servers and miq_workers tables: ManageIQ/manageiq#6112
since PSS is not exactly easy to remember, it's the proportional_set_size column and log line
I need this for the forking workers work
will probably need to change the worker threshold checks to use PSS instead of RSS in the future
Alex Krzos
@akrzos
Jan 13 2016 14:21
Nice :thumbsup:
Joe Rafaniello
@jrafanie
Jan 13 2016 14:23
starting to unravel the start/stop worker code a bit since I need to start workers slightly differently with forked processes... need to do stuff before and after fork...
I might get into the territory of your existing BZs regarding starting a worker before another exits
but hopefully, memory issues should greatly decrease once I get the preloading before the forks working
Alex Krzos
@akrzos
Jan 13 2016 14:25
Yeah I would hope that forking workers and sharing memory really cuts down on the cost of having another worker
the more workers we add right now we pay full price with memory
Joe Rafaniello
@jrafanie
Jan 13 2016 14:26
but to be honest, the most obvious improvement is the speed of starting workers
lots of IO contention is just gone
Alex Krzos
@akrzos
Jan 13 2016 14:26
That would be interesting to measure
Joe Rafaniello
@jrafanie
Jan 13 2016 14:27
I'm using appliances in vmware fusion... vanilla evm:start with no providers changes worker start phase from around 40 seconds to around 15
but that's before I rework sync_workers to separate the "determine how many workers we need for this class" from the "start/stop workers for this class"
so that may change a little
the overhead of starting workers maybe even small enough later on that we could scale worker counts
Alex Krzos
@akrzos
Jan 13 2016 14:29
so that speed gain will make the number of workers running concurrently > configured count a potentially greater issue
Joe Rafaniello
@jrafanie
Jan 13 2016 14:29
but tbh, I'm hesitant to add even more specialization to our workers
yeah, I expect some fallout of code expecting worker startup to be slow
I think I'm about 1 week from having the major PR in
Dennis Metzger
@dmetzger57
Jan 13 2016 14:32
:+1:
Joe Rafaniello
@jrafanie
Jan 13 2016 14:33
I need to make these other changes in as they will make the forking easier
thankfully some of this is completely compatible with our existing Process.spawn way of creating processes
so the PR isn't getting too huge
Alex Krzos
@akrzos
Jan 13 2016 14:40
@jrafanie Also I will run the replication worker test against 5.5.0.13 and 5.4.4.2 so we can see its behavior back then as well
Joe Rafaniello
@jrafanie
Jan 13 2016 14:42
ok
I imagine it will exhibit a similar ~25 MB growth within the first minute due to code changes, gem changes, etc.
Keenan Brock
@kbrock
Jan 13 2016 14:45
@akrzos that smaller VM - think something is missing on it. it doesn't blow up like the larger one.
But I think I cracked the nut: https://github.com/ManageIQ/manageiq/pull/6162#issue-126338504 minor tweaks and serious differences /cc @jrafanie
Alex Krzos
@akrzos
Jan 13 2016 14:45
@kbrock hmm, let me take a look at it
Keenan Brock
@kbrock
Jan 13 2016 14:46
@akrzos look at those numbers in that PR
Alex Krzos
@akrzos
Jan 13 2016 14:46
@kbrock Could we patch my 5.5 scale environment with this?
Keenan Brock
@kbrock
Jan 13 2016 14:46
the changes are pretty minor
there is another patch that I really like, but this one seems to have the most impact
I do remember that the stuff has a multiplier effect
merging the two is much better than each individually
is your 5.5 scale using git?
or you want a x.patch?
Alex Krzos
@akrzos
Jan 13 2016 14:47
5.5 isn't
Keenan Brock
@kbrock
Jan 13 2016 14:48
hmm
Keenan Brock
@kbrock
Jan 13 2016 14:48
dude
wtf
Alex Krzos
@akrzos
Jan 13 2016 14:48
and apply it and then restart evmserverd
Keenan Brock
@kbrock
Jan 13 2016 14:48
I guess if you snapshot it first, then that will be great
how old is your 5.5? nightly?
Alex Krzos
@akrzos
Jan 13 2016 14:49
it's what we shipped
5.5.0.13-2
Keenan Brock
@kbrock
Jan 13 2016 14:49
aah - cool (ish)
Alex Krzos
@akrzos
Jan 13 2016 14:50
The patch only affects the code that runs perf_capture_timer right? thus I'd only have to patch that specific appliance right?
Keenan Brock
@kbrock
Jan 13 2016 14:50
correct
not sure if this patch would apply correctly, but may be worth a try too
Alex Krzos
@akrzos
Jan 13 2016 14:51
oh so does this rely on multiple commits?
Keenan Brock
@kbrock
Jan 13 2016 14:52
they are separate
probably won't affect eachother
the first patch is less intrusive
the second one (already on master) is the other half
Alex Krzos
@akrzos
Jan 13 2016 14:52
gotcha
let me try one at a time
Keenan Brock
@kbrock
Jan 13 2016 14:52
yes
try the one you pointed to first
Alex Krzos
@akrzos
Jan 13 2016 14:55
Random question for anyone, does httpd recycle processes when running any sort of smartstate workload?
Joe Rafaniello
@jrafanie
Jan 13 2016 15:07
nice @kbrock, added comment to the PR
@akrzos httpd doesn't restart processes that I know of unless you configure httpd.conf for specifics regarding it's own worker sizing
we do restart apache after adding UI/WS worker ports to the load balancer
we cut a log line about restarting it
look for MiqApache or something
Keenan Brock
@kbrock
Jan 13 2016 15:09
@jrafanie I think with THAT pr, the my_region shows up with all the advantages
I hasn't realized that the inverse_of made such an impact
Alex Krzos
@akrzos
Jan 13 2016 15:10
@jrafanie I'm seeing a lot of httpd processes that were short lived during my smart state analysis workload, not a casulty here but just curious as to what would cause the turnover in httpd processes, I think there is some cycling of processes when we reload the httpd config during log rotate but i haven't investigated just yet
Joe Rafaniello
@jrafanie
Jan 13 2016 15:11
we queue a message to restart apache
(when we add new ports to the balancer config)
since apache's graceful restart doesn't pick up balancer config changes

I hasn't realized that the inverse_of made such an impact

yeah, you only see an impact when you navigate back and forth from the same association and build relations off of the duplicated objects

Alex Krzos
@akrzos
Jan 13 2016 15:12
@kbrock I was able to apply that patch however during shutdowning evmserverd, one of the priority workers has that message and is still running with it, can I just kill that worker?
Joe Rafaniello
@jrafanie
Jan 13 2016 15:13
it's really easy to see by looking at ObjectSpace.each_object(ActiveRecord::Base) {} counts like I have in that merged PR
Keenan Brock
@kbrock
Jan 13 2016 15:29
@akrzos I guess so? that or just click the red button on the front of the server ;)
Alex Krzos
@akrzos
Jan 13 2016 15:51
@kbrock it eventually expired, when it realized it was the only worker left on the appliance (and finished his message it was processing)
Alex Krzos
@akrzos
Jan 13 2016 16:55
@kbrock running with the first patch I'm not seeing an noticeable gain yet
but the message ran with a generic worker instead of a priority worker for the first one processed
Keenan Brock
@kbrock
Jan 13 2016 18:09
@akrzos teasing those patches apart may not work so well
Alex Krzos
@akrzos
Jan 13 2016 18:09
@kbrock So I need to run both together?
Keenan Brock
@kbrock
Jan 13 2016 18:10
I'm wondering if you need the second patch
I had some good results for caching region
but when I ran it separately, it gained me nothing
but after those other patches it gets rid of 20 queries (15%)
Alex Krzos
@akrzos
Jan 13 2016 18:11
gotcha
adding the second patch now
Keenan Brock
@kbrock
Jan 13 2016 18:11
I was hoping it wasn't necessary
sorry
Keenan Brock
@kbrock
Jan 13 2016 19:39
there was a presentation a while back from a new relic guy who showed how to rewrite new relic rpm in 100 lines of code
anyone remember this? (probably 5 years ago :( )
we need better profiling tools. mini-rack-profiler is great - but we need a way to call from the command line and put the results into a db / not html
Keenan Brock
@kbrock
Jan 13 2016 20:13
@matthewd rails/rails@26d19b4 to automatically detect :inverse_of
@matthew rails 4 doesn't have has_inverse?
Jason Frey
@Fryguy
Jan 13 2016 20:14
that PR you cited was in 4.1.0 beta
Joe Rafaniello
@jrafanie
Jan 13 2016 20:16
by the way, these are the conditions that determine if it can or cannot automatically determine inverse_of
VALID_AUTOMATIC_INVERSE_MACROS = [:has_many, :has_one, :belongs_to]
INVALID_AUTOMATIC_INVERSE_OPTIONS = [:conditions, :through, :polymorphic, :foreign_key]
Matthew Draper
@matthewd
Jan 13 2016 20:16
@kbrock seems to be a has_inverse?.. though automatic_inverse_of may also be relevant
I suppose if it can't find the calculated association name due to funny names, that could also make it fail
Keenan Brock
@kbrock
Jan 13 2016 20:18
@matthewd I've found all of them on master
@matthewd but they are not on our rails - looking
thanks for the thought - having fun in here
@matthewd ooh - there is a @automatic_inverse_of - COOL.
Matthew Draper
@matthewd
Jan 13 2016 20:23
I'd rather you use the method than the ivar, but yeah, that should work too
Keenan Brock
@kbrock
Jan 13 2016 20:23
not there yet. just saying got it from here - yes. use methods (even if private)
Jason Frey
@Fryguy
Jan 13 2016 20:32
ok, so if you keep ems (because of the foreign_key) and vm_or/and_template because of the name
then you still have the reset which look like they should be automatically inversed
Is there a simple way in console or something to ask if it can_be_inversed?
I think that would just answer my questions, quite quickly
Matthew Draper
@matthewd
Jan 13 2016 20:36
User.reflections['taggings'].inverse_of / User.reflections['taggings'].send(:automatic_inverse_of) sound like they might be answering that question
For a more specifically relevant example: EmsCluster.reflections['hosts'].send(:automatic_inverse_of) => :ems_cluster
Keenan Brock
@kbrock
Jan 13 2016 20:52
I've got it
Have a script that tells you all the reflections and whether they have an inverse or not
Jason Frey
@Fryguy
Jan 13 2016 20:59
gist?
Jason Frey
@Fryguy
Jan 13 2016 21:20
964378
515631
LOL I got a DOUBLE Yubikey dump
Keenan Brock
@kbrock
Jan 13 2016 21:21
nice
Jason Frey
@Fryguy
Jan 13 2016 21:21
545746
684289
dAMMIT
I'm hugging my laptop :D
Joe Rafaniello
@jrafanie
Jan 13 2016 21:21
unsubscribe
Jason Frey
@Fryguy
Jan 13 2016 21:22
haha
Greg McCullough
@gmcculloug
Jan 13 2016 21:25
@Fryguy glad to see your yubikey finally join the conversation.
Keenan Brock
@kbrock
Jan 13 2016 21:30
So here are the results I'm getting for inverse_of - they don't look so good:
https://gist.github.com/kbrock/b764ff3fc1cb63c18d0d
Matthew Draper
@matthewd
Jan 13 2016 21:36
@kbrock hmm.. can you expand that with some of the logic from automatic_.., to work out why they're missing?
Joe Rafaniello
@jrafanie
Jan 13 2016 21:52
and just because inverse_of isn't defined doesn't mean rails won't figure it out...
maybe the title of the gist is misleading classes that have no inverse_of defined
@jrafanie no - this is saying "tell me the inverse_of, either the way I stated it or the way you figured it out"
Joe Rafaniello
@jrafanie
Jan 13 2016 23:45
Right, I'm just saying you can specify inverse_of on a AR model, it's optional. What you're saying gives it another meaning... this is the inverse_of that you told me to use or what I figured out...
anyway, good night, good luck :-)
Keenan Brock
@kbrock
Jan 13 2016 23:45
night
I'm printing out the logic on how rails figures it out on their own
they seem to be missing a number of cases
night