These are chat archives for ManageIQ/manageiq/performance

10th
Feb 2016
Keenan Brock
@kbrock
Feb 10 2016 15:04
@jrafanie in your worker/cow/fork work - did you load primordial seeds before you forked? think this would have a significant impact since it loads a bunch of models and it blocks others from booting.
(or I guess you could test with primordial seeds turned off to get an idea how much it is blocking you) but not ready to have that debate right now.
Joe Rafaniello
@jrafanie
Feb 10 2016 15:07
@kbrock yes, we do seed before we fork workers
Keenan Brock
@kbrock
Feb 10 2016 15:07
cool - thanks
Joe Rafaniello
@jrafanie
Feb 10 2016 15:08
MiqServer#start, which follows the seed_last, calls the initial sync_workers which forks the workers
I took a break from analyzing the CoW unfriendly of our app with systemtap... but even after the nagayoshi_fork gem which does the magical GC a few times before we fork, we still quickly transition shared pages to private pages in the forked children
Keenan Brock
@kbrock
Feb 10 2016 15:13
sounds good
Oleg Barenboim
@chessbyte
Feb 10 2016 15:38
@jrafanie so workers sharing memory with parent process is not real? then what was the point?
Joe Rafaniello
@jrafanie
Feb 10 2016 15:43
yeah, @chessbyte the PR stated there wasn't much sharing memory... the initial value is the speed with starting workers... 45 second -> 15 second on the initial sync_workers
Jason Frey
@Fryguy
Feb 10 2016 15:44
@chessbyte That's only because we uncovered a bug in Ruby
we expected the memory to stay shared
Joe Rafaniello
@jrafanie
Feb 10 2016 15:45
we found the age in the header of the object was being modified, making it not CoW friendly :cow:
even with a workaround of making as many objects old as possible, we're still seeing shared pages go to private pages very quickly
So, initial benefit is fast launching of workers... if we can figure out what's making our app not CoW friendly, we could possibly share a lot more memory
Oleg Barenboim
@chessbyte
Feb 10 2016 15:51
right, right
and @tenderlove is pushing to address the Ruby bug, IIRC
Aaron Patterson
@tenderlove
Feb 10 2016 15:52
ya, that's why I'm trying to move the age to a new table
risk is that the new table will just get copied too
:(
Oleg Barenboim
@chessbyte
Feb 10 2016 15:52
@tenderlove is an actuary
Jason Frey
@Fryguy
Feb 10 2016 15:53
it's true...he likes the word "amortizes" :D
@tenderlove How's that going, incidentally?
Oleg Barenboim
@chessbyte
Feb 10 2016 15:53
@tenderlove there is some grist for you pun mill
Aaron Patterson
@tenderlove
Feb 10 2016 15:54
it's going OK. I've added the table, but I'm having trouble flipping bits / replacing the age lookup
it's tedious
Oleg Barenboim
@chessbyte
Feb 10 2016 15:55
@tenderlove want someone to pair with to talk through it?
Jason Frey
@Fryguy
Feb 10 2016 15:55
yeah, was going to say that I can help out if needed
Aaron Patterson
@tenderlove
Feb 10 2016 15:55
ya, in a bit
/play rimshot
Jason Frey
@Fryguy
Feb 10 2016 15:56
hahaha
Aaron Patterson
@tenderlove
Feb 10 2016 15:56
but seriously, maybe in an hour or two
I'm still working it out
Jason Frey
@Fryguy
Feb 10 2016 15:56
k cool
Joe Rafaniello
@jrafanie
Feb 10 2016 15:58
/play groan
Even with the age bit flipping fixed, it appears we're still modifying shared pages... I'd like to spend a bit more time here and there finding better ways to identify where/why we're modifying shared pages, leading to copied private pages...
Jason Frey
@Fryguy
Feb 10 2016 16:02
@tenderlove If you get stuck, ping me anyway...I can be the cardboard cutout developer
Aaron Patterson
@tenderlove
Feb 10 2016 16:02
I will, thank you!!! <3
Jason Frey
@Fryguy
Feb 10 2016 16:03
@jrafanie Any object we modify will obviously come over
Joe Rafaniello
@jrafanie
Feb 10 2016 16:03
get out your pear
@Fryguy last time I looked it was in IO#read in resizing the buffer I believe
but I need to confirm it
Jason Frey
@Fryguy
Feb 10 2016 16:04
reading what though?
Joe Rafaniello
@jrafanie
Feb 10 2016 16:04
systemtap gives more info than you want so you have to filter it

reading what though?

Yes

good question
Jason Frey
@Fryguy
Feb 10 2016 16:04
well, I guess it doesn't matter...any new data you bring into the buffer will modify an object and copy, so that's expected (to me)
what's unexpected are the copying of objects that aren't touched (i.e. the age in the header bug)
Joe Rafaniello
@jrafanie
Feb 10 2016 16:05
yeah, I'm afraid I'm getting noise (normal things we expect to be copied) in my results
Jason Frey
@Fryguy
Feb 10 2016 16:05
yeah, that's tough
Joe Rafaniello
@jrafanie
Feb 10 2016 16:05
so, I'll have to try a better way to figure it out
Keenan Brock
@kbrock
Feb 10 2016 18:48
@jrafanie I know ManageIQ/manageiq#6599 doesn't look like much of a change, but as I was trying to say for the past few days, loading the configuration data is very expensive
I jumped through hoops to isolate this data. Is there some other output you would prefer to see?
Jason Frey
@Fryguy
Feb 10 2016 18:49
the changes you made cannot possibly have any change, is what we're saying
you went from MiqServer.my_server(true) to MiqServer.my_server(true)
Keenan Brock
@kbrock
Feb 10 2016 18:50
yes, you've been telling me this for the past week
ooh
you're right, the code I ran defaulted that to false
what numbers do you need to see so I can prove that it makes a difference
I'm so tired of running the same things over and over and just hearing "yea, we don't belive you"
and thus making mistakes like this one
Jason Frey
@Fryguy
Feb 10 2016 18:51
none...I'm sure it makes a difference if it's not defaulted to "true" :)
Keenan Brock
@kbrock
Feb 10 2016 18:52
but, you are saying you don't like the change
even if it is 10%
Jason Frey
@Fryguy
Feb 10 2016 18:52
I didn't say that, did I?
oh that
I think you are missing the juggling question...it's a matter of risk vs reward
Keenan Brock
@kbrock
Feb 10 2016 18:52
we cache the config in so many places all over the code
Jason Frey
@Fryguy
Feb 10 2016 18:52
and time-of-effort vs amount of impact
Keenan Brock
@kbrock
Feb 10 2016 18:52
ok
I have charts showing the impact
Jason Frey
@Fryguy
Feb 10 2016 18:53
in this case, a 15% for a one-liner that introduces very little risk is a no brainer
Keenan Brock
@kbrock
Feb 10 2016 18:53
the effort here has been 2 weeks proving that this 1 line makes an impact
Jason Frey
@Fryguy
Feb 10 2016 18:54
when one is looking at code that takes 50 seconds and 5 days are spent focusing on a 5 second piece vs the other 45 seconds, then it doesn't make sense
sorry...but I guess I missed the proof elsewhere
Keenan Brock
@kbrock
Feb 10 2016 18:54
I ran this for the whole operation
not the individual components
Jason Frey
@Fryguy
Feb 10 2016 18:55
or got mixed in with all the other changes we were investigating and got lost in the mire
Keenan Brock
@kbrock
Feb 10 2016 18:55
well, I had ran for the individual components before, but got the feedback that the top level number is all that matters
Jason Frey
@Fryguy
Feb 10 2016 18:55
sorry if that's how it was conveyed :/
I mean, I did a PR for the ems_id and it was only 1%...but the PR took me all of 5 minutes to put together
Keenan Brock
@kbrock
Feb 10 2016 18:56
I have 3 changes that take off 50% of the runtime
I did it in all of 2 hours
but I've spent 2 weeks proving that I'm not in the weeds
Jason Frey
@Fryguy
Feb 10 2016 18:56
3 changes for 50% is awesome
Keenan Brock
@kbrock
Feb 10 2016 18:56
sorry
I'll revisit this pr
Jason Frey
@Fryguy
Feb 10 2016 18:57
for ManageIQ/manageiq#6599 I think we might need a test on the other line I commented on (if we don't have one already)
basically, my thinking is that the reason we even did my_server(true) vs just my_server was to reload the relation cache for .configurations
so if you default to my_server(false), then we have to be sure we are getting the latest updated_on for the configurations call 2 lines below
Keenan Brock
@kbrock
Feb 10 2016 18:59
agreed
but we are just caching that config all over the place
@vmdb_config ||= VMDB::Config.new("vmdb").config
Jason Frey
@Fryguy
Feb 10 2016 19:01
usually just for the life of a request
Keenan Brock
@kbrock
Feb 10 2016 19:02
there is one in MiqServer
Jason Frey
@Fryguy
Feb 10 2016 19:06
link?
Jason Frey
@Fryguy
Feb 10 2016 19:08
ah right...that one is specifically for the worker management
Joe Rafaniello
@jrafanie
Feb 10 2016 19:09
I'm so confused, are we still talking about the ManageIQ/manageiq#6599?
Keenan Brock
@kbrock
Feb 10 2016 19:09
good to know the main theme
Jason Frey
@Fryguy
Feb 10 2016 19:09
and its lifetime is managed elsewhere (in miq_server/configuration_management.rb) I think
I really want to break that crap out of MiqServer, but that's going to be follow up to the config
Keenan Brock
@kbrock
Feb 10 2016 19:09
ok, I'll look for another way to get rid of the 150 queries / 12%
Jason Frey
@Fryguy
Feb 10 2016 19:10
I'm confused...nothing here stated you should give up on ManageIQ/manageiq#6599
Joe Rafaniello
@jrafanie
Feb 10 2016 19:11
ah, you changed the default to not reload
wow, I was going crazy trying to figure out how that was going to change things before ;-)
Keenan Brock
@kbrock
Feb 10 2016 19:11
I was running the two numbers
and I pushed the wrong one :(
Joe Rafaniello
@jrafanie
Feb 10 2016 19:11
ok
Keenan Brock
@kbrock
Feb 10 2016 19:11
lol
Joe Rafaniello
@jrafanie
Feb 10 2016 19:12
So, yes, that will change things a lot
Keenan Brock
@kbrock
Feb 10 2016 19:12
but looks like we want to ensure we are loading the MiqServer record every time. so this is not an option
Joe Rafaniello
@jrafanie
Feb 10 2016 19:12
yeah, I would think you'd have to do it higher
Keenan Brock
@kbrock
Feb 10 2016 19:12
so I need to reduce the number of calls to look up that parameter
as I showed, it is very deep
and it is mixed into every capturable object
Joe Rafaniello
@jrafanie
Feb 10 2016 19:13
for example, in the User model when it asks for the server timezone and hits VMDB::Config.new a bunch of times
Keenan Brock
@kbrock
Feb 10 2016 19:13
yes
Jason Frey
@Fryguy
Feb 10 2016 19:14

but looks like we want to ensure we are loading the MiqServer record every time. so this is not an option

What if you just change the MiqServer.my_server(true) to MiqServer.my_server and that's it

Keenan Brock
@kbrock
Feb 10 2016 19:14
that is not ensuring it is loading every time
Jason Frey
@Fryguy
Feb 10 2016 19:14
who said it has to load every time?
and then just verify that the call 2 lines after to server.configurations.select("updated_on") returns the right value
Keenan Brock
@kbrock
Feb 10 2016 19:15
We need the latest configurations record, not a cached one.
@Fryguy ref
Jason Frey
@Fryguy
Feb 10 2016 19:16
server.configurations, not server
Oleg Barenboim
@chessbyte
Feb 10 2016 19:16
@kbrock @Fryguy if typing in Gitter is not the best means to get to a common understanding quickly, then maybe you should try a different technique?
Jason Frey
@Fryguy
Feb 10 2016 19:16
the server object can stay the same as long as the server.configurations.select("updated_on") gives us the right value (which it should by default)
agreed...I think we are just crossing wires :smile:
Keenan Brock
@kbrock
Feb 10 2016 19:17
@chessbyte thank you