These are chat archives for ManageIQ/manageiq/performance

21st
Jun 2016
Keenan Brock
@kbrock
Jun 21 2016 12:45
For Cap&U - is it possible that we are spending most of our time logging out of vmware?
if you watch the messages go through the screen, there is activity then a pause
definite cadence
when I control-c from within that, it is always "parsing the logout"
Jason Frey
@Fryguy
Jun 21 2016 14:48
anything is possible, but in production we use the broker which handles connections
by keeping them open and pooling them
but I'd be surprised if the logout if that expensive... cc @agrare
Adam Grare
@agrare
Jun 21 2016 14:51
I would also be surprised
no return value so nothing to parse
Joe Rafaniello
@jrafanie
Jun 21 2016 14:52
@kbrock maybe turn on vim debug logging
Keenan Brock
@kbrock
Jun 21 2016 16:30
thanks all.
something else is there
Keenan Brock
@kbrock
Jun 21 2016 21:04
If MiqAlert.target_needs_realtime_capture? is slow. Do I need to speed up, or can I just drop it?
@Fryguy @gmcculloug ^
Jason Frey
@Fryguy
Jun 21 2016 21:05
do you know what it's for?
Keenan Brock
@kbrock
Jun 21 2016 21:05
it determines if a class has outstanding alerts?
Jason Frey
@Fryguy
Jun 21 2016 21:05
no :) It asks if alerts have been assigned to the target
if they have been, then we have to do realtime calculations for the alerts
if they haven't been, then we can skip that processing
Keenan Brock
@kbrock
Jun 21 2016 21:08

stubbed
FREED: 721565

ms bytes objects queries query (ms) rows comments
733.4 29,278,542 308,901 335 82.4 203 no check

present
FREED: 2983528

ms bytes objects queries query (ms) rows comments
6,556.7 2,005,219 3,562,126 1,115 4,127.9 11,206 check alerts
@Fryguy it just decides to run cap&u 10x more often if there are alerts for the target
@Fryguy so that means I have to keep it?
ok, I have some ideas - thanks
it is preventing cap&u from running on mid sized environments (not even alex's xlg)
Jason Frey
@Fryguy
Jun 21 2016 21:11

decides to run cap&u 10x more often if there are alerts for the target

No, it analyzes the realtime data for alerting trends

does alex have any alerts enabled?
Keenan Brock
@kbrock
Jun 21 2016 21:11
I guess
Jason Frey
@Fryguy
Jun 21 2016 21:12
then I'm not sure it's an apples to apples comparison
Keenan Brock
@kbrock
Jun 21 2016 21:12
25 in my local environment
100 vms
Jason Frey
@Fryguy
Jun 21 2016 21:12
25 what...alerts?
Keenan Brock
@kbrock
Jun 21 2016 21:12
ooh
Jason Frey
@Fryguy
Jun 21 2016 21:12
or vms with alerts enabled?
Keenan Brock
@kbrock
Jun 21 2016 21:12
MiqAlert.count == 25
I have no idea what this is. I just work here
Jason Frey
@Fryguy
Jun 21 2016 21:12
I don't think the alert count is meaningful, but I could be wrong
you need to look at the number of VMs that have alerts assigned
Keenan Brock
@kbrock
Jun 21 2016 21:13
    key, default = MiqAlert.target_needs_realtime_capture?(target) ? [:capture_threshold_with_alerts, 1] : [:capture_threshold, 10]
that line caused the difference
Jason Frey
@Fryguy
Jun 21 2016 21:13
yes I know the line
Keenan Brock
@kbrock
Jun 21 2016 21:13
ok
I found problems with that method call
Jason Frey
@Fryguy
Jun 21 2016 21:13
but what it does based on that is what I'm asking about
Keenan Brock
@kbrock
Jun 21 2016 21:13
I'll fix
Jason Frey
@Fryguy
Jun 21 2016 21:13
it's the method itself?
Keenan Brock
@kbrock
Jun 21 2016 21:13
ooh
Jason Frey
@Fryguy
Jun 21 2016 21:13
MiqAlert.target_needs_realtime_capture, that is?
Keenan Brock
@kbrock
Jun 21 2016 21:13
yes
Jason Frey
@Fryguy
Jun 21 2016 21:13
ok
Keenan Brock
@kbrock
Jun 21 2016 21:13
the method
at the end of that method
I say "yes, run capture"
Jason Frey
@Fryguy
Jun 21 2016 21:14
yeah, that can probably be sped up if possible would be great
Keenan Brock
@kbrock
Jun 21 2016 21:14
I no-opped the decision
I'm testing cap&U - so I need to run. (not every few minutes...)
so just asking the question "does it have alerts" = 700ms vs 6,556ms (100 vms)
Jason Frey
@Fryguy
Jun 21 2016 21:15
ok that's really expensive
Keenan Brock
@kbrock
Jun 21 2016 21:15
again, I have a few ideas
Jason Frey
@Fryguy
Jun 21 2016 21:15
but you can't just remove it...unless you want to break alerting :P
Keenan Brock
@kbrock
Jun 21 2016 21:15
but was asking if I could delete instead :)
eh
lol
thanks
Jason Frey
@Fryguy
Jun 21 2016 21:15
who needs it ¯\_(ツ)_/¯ haha
Keenan Brock
@kbrock
Jun 21 2016 21:15
fun thing - the first run is fast
so benchmarks show fast
it is only until the second / third time you run until problems arise
Jason Frey
@Fryguy
Jun 21 2016 21:16
oh weird
Keenan Brock
@kbrock
Jun 21 2016 21:17
huh
so MiqAlertSet.size == 0
Oleg Barenboim
@chessbyte
Jun 21 2016 21:17
@kbrock why are you fixing performance of alerts??
Keenan Brock
@kbrock
Jun 21 2016 21:17
5.5
cap & u
can't run a medium sized provider
times out
Oleg Barenboim
@chessbyte
Jun 21 2016 21:17
BZ?
Keenan Brock
@kbrock
Jun 21 2016 21:17
y
Oleg Barenboim
@chessbyte
Jun 21 2016 21:17
roger
Keenan Brock
@kbrock
Jun 21 2016 21:17
oleg
it is not alerts
well - it overlaps w/ them
we have a faulty "lookup this config value" entry
Oleg Barenboim
@chessbyte
Jun 21 2016 21:18
I was just trying to get context on the why
Keenan Brock
@kbrock
Jun 21 2016 21:18
+1
lol. I was surprised to find myself in alerts too :)
Oleg Barenboim
@chessbyte
Jun 21 2016 21:19
I mean I am sure they can be improved performance-wise, just don't think that many/any customers use them
Jason Frey
@Fryguy
Jun 21 2016 21:19

decides to run cap&u 10x more often if there are alerts for the target

I reread what you wrote here and I replied incorrectly...This is correct

We run "realtime" collections closer to realtime for targets with alerts
Keenan Brock
@kbrock
Jun 21 2016 21:19
@Fryguy cool - that is what I thought.... but who knows
so I decided to ask to make sure
thanks
yea - think we can get this much lower
Jason Frey
@Fryguy
Jun 21 2016 21:20

just don't think that many/any customers use them

even more reason to optimize the check in C&U...it's the part where we check if alerts are enabled during a C&U run that's slow

Keenan Brock
@kbrock
Jun 21 2016 21:21
we run this check for EVERY cap&u object
even if disabled for all
well, I'll speed up the check
then will optimize it all out if we can at a later time
want to get this to c / darga if possible (though think those ships may have sailed)
Jason Frey
@Fryguy
Jun 21 2016 21:22
:+1:
well it sailed for darga-1 , but no reason it can't make it to darga-2
Keenan Brock
@kbrock
Jun 21 2016 21:22
+1
this is from before "A"
Jason Frey
@Fryguy
Jun 21 2016 21:23
yeha, that has been around forever
there's a lot of those kind of issues :P
Keenan Brock
@kbrock
Jun 21 2016 21:24
omg - are you kidding me?
AssignmentMixin#get_assigned_for_target
Joe Rafaniello
@jrafanie
Jun 21 2016 21:27
so many words... @kbrock what page are you looking at?
Jason Frey
@Fryguy
Jun 21 2016 21:28
he's not...it's C&U
Joe Rafaniello
@jrafanie
Jun 21 2016 21:28
:confused: ok
Keenan Brock
@kbrock
Jun 21 2016 21:29
for 100 vms, we're asking "is there an alert on this object"
that is 8k extra queries, 6 million object allocations, and 11k rows brought back
and thousands of lines of code ;)
Oleg Barenboim
@chessbyte
Jun 21 2016 21:30
and you think that's more than it should be?
man, I love messing with @kbrock
Keenan Brock
@kbrock
Jun 21 2016 21:30
lol
Jason Frey
@Fryguy
Jun 21 2016 21:30
hahaha
Keenan Brock
@kbrock
Jun 21 2016 21:30
well, I was doing ok seeing the collect
Oleg Barenboim
@chessbyte
Jun 21 2016 21:30
what's an extra 8000 SQL queries among friends?
Keenan Brock
@kbrock
Jun 21 2016 21:31
but then I saw a get_assigned_for_target which looks up the profile in 60 lines of DENSE ruby
lol
@chessbyte and the fun thing is that the cap&u takes only 300 queries
Keenan Brock
@kbrock
Jun 21 2016 21:41
@Fryguy how much $ do I have to bribe you to reconsider letting me drop this?
can we just run all cap & U every 5 minutes (split the difference)?
Jason Frey
@Fryguy
Jun 21 2016 21:41
it's a real feature...you can't just drop it without an alternative :sweat_smile:
Keenan Brock
@kbrock
Jun 21 2016 21:42
if they have alerts, lower the frequency
if they don't leave at 10 minutes
Jason Frey
@Fryguy
Jun 21 2016 21:42
that's exactly what that line does
Keenan Brock
@kbrock
Jun 21 2016 21:42
no
we use vodoo to determine it
how about the implementers/SAs do it?
Jason Frey
@Fryguy
Jun 21 2016 21:43
 key, default = MiqAlert.target_needs_realtime_capture?(target) ? [:capture_threshold_with_alerts, 1] : [:capture_threshold, 10]
means "if they have alerts, increase the frequency (to one minute), else leave at 10 minutes"
Keenan Brock
@kbrock
Jun 21 2016 21:43
OR
anyway
ok
why are tags involved?
Jason Frey
@Fryguy
Jun 21 2016 21:44
because @gtanzillo likes to use tagging haha
Keenan Brock
@kbrock
Jun 21 2016 21:44
assigned_to_target is very involved
Jason Frey
@Fryguy
Jun 21 2016 21:44
RIGHT UNDER THE BUS :bus:
Keenan Brock
@kbrock
Jun 21 2016 21:44
hmm... he sure does
lol
looks like assigned_to_target is used with Chargebacks and Alerts
too many branches to grok
it is asking if something has a tag / property?
ok, so we have no AlertSet - so this code is going to do nothing in my environment
Keenan Brock
@kbrock
Jun 21 2016 22:02
@Fryguy how old is relationships? it sure rears it heads in a lot of stuff I want to change
Jason Frey
@Fryguy
Jun 21 2016 22:06
relationship in general or the ancestry based ones
in general it goes back to the primordial commit from like 2007
(FYI: git log --oneline --follow app/models/relationship.rb | tail -n 1)
Oleg Barenboim
@chessbyte
Jun 21 2016 22:07
@kbrock one shiny object at a time, please
Jason Frey
@Fryguy
Jun 21 2016 22:08
haha it existed before $log existed
If I have a SQL line in DEBUG in the log, is there a sneaky way to print the call stack at the moment?
I have an N+1 and want the log to tell me where it's being called from
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:13
@Fryguy Kernel::caller.inspect?
Jason Frey
@Fryguy
Jun 21 2016 22:13
yeah but where do I put that
P.S. I am a puts debugger haha
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:14
me too
Keenan Brock
@kbrock
Jun 21 2016 22:18
@chessbyte I'm only focusing on 8k shiny objects at a time
Gregg Tanzillo
@gtanzillo
Jun 21 2016 22:26
For the record, I don’t like tags. We got a little tag happy in the early days. We tagged all the things!
Keenan Brock
@kbrock
Jun 21 2016 22:27
well, I'm hating all the things we tagged right now.
Keenan Brock
@kbrock
Jun 21 2016 22:27
nice
Jason Frey
@Fryguy
Jun 21 2016 22:28
So, I found an N+1 of sorts with treebuilders
So far, I only see it specifically in the ops -> database accordion
but I haven't tried all the pages
basically, the TreeBuilder has been refactored a lot, and we end up calling .count once per node
I'm not sure how to avoid it though because it's like a recursive thing and this count is at the bottom
is there a way to "preload" a count?
Keenan Brock
@kbrock
Jun 21 2016 22:30
right now, anytime I'm seeing cache_with_timeouts, I'm reading: "I'm sorry, I did really bad ruby. I'm going to paint over this and hope it goes away"
jason - working on it
Dan Clarizio
@dclarizio
Jun 21 2016 22:30
@Fryguy so I think we used .count to determine if we showed a + on the tree node, indicating there were children . . . since you redid many of the trees to build fully, perhaps it's only left in some of them
Keenan Brock
@kbrock
Jun 21 2016 22:30
@Fryguy virtual_total is about that
I want to put into main query
Jason Frey
@Fryguy
Jun 21 2016 22:30

redid many of the trees to build full

only 1 :(

Keenan Brock
@kbrock
Jun 21 2016 22:31
and change virtual column's to load from a custom query that is in the main query
that will speed up a lot of pages
Dan Clarizio
@dclarizio
Jun 21 2016 22:31
@Fryguy which may have actually hurt us if there are 25K VMs, building a HUGE tree, LOL . . . we can't win sometimes
Jason Frey
@Fryguy
Jun 21 2016 22:31
@kbrock Is that in response to my question or something else?
@dclarizio Well, yeah, I have to verify on other pages
Keenan Brock
@kbrock
Jun 21 2016 22:31
@Fryguy it was responding to your question
but I was planning on doing this before
Jason Frey
@Fryguy
Jun 21 2016 22:31
@kbrock I don't see the connection?
Keenan Brock
@kbrock
Jun 21 2016 22:31
... time ...
preload a count?
preload a total?
Jason Frey
@Fryguy
Jun 21 2016 22:32
there is no cache_with_timeout with what I am looking at
Keenan Brock
@kbrock
Jun 21 2016 22:32
ooh
sorry
cache_with_timeout = different
Jason Frey
@Fryguy
Jun 21 2016 22:32
yeah
Keenan Brock
@kbrock
Jun 21 2016 22:32
now I see why my cap&u numbers were so sporadic
Jason Frey
@Fryguy
Jun 21 2016 22:33
oh right...there is a cache_with_timeout there

just to clarify:

right now, anytime I'm seeing cache_with_timeouts, I'm reading: "I'm sorry, I did really bad ruby. I'm going to paint over this and hope it goes away"

you should probably be reading that as "Oh man, it's 3AM and we have a customer POC tomorrow and there's no way this'll work...this will have to do for now" haha :D

Keenan Brock
@kbrock
Jun 21 2016 22:34
ha
Jason Frey
@Fryguy
Jun 21 2016 22:46
I figured out a way to avoid the N+1 for this particular screen...brings back more data, but ¯\_(ツ)_/¯ better than hitting the DB over and over
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:48
is it possible to have the UI only return the top level leaf nodes, and make a request every time you need to open up further down the tree?
probably a huge refactoring effort, and I have no way to test, just curious if that is a more scalable possibility

based on

since you redid many of the trees to build fully

This may have already been considered... so yeah...

Jason Frey
@Fryguy
Jun 21 2016 22:50
@NickLaMuro That's how the code was originally written but there's this feature where we remember what nodes a user has opened
sort of...
so the problem is when they go back to a screen it slams the server with like an N+1 of requests
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:51
ah, fun
yup, then I got nothin...
Jason Frey
@Fryguy
Jun 21 2016 22:51
haha
NickLaMuro @NickLaMuro crawls back into his hole
Jason Frey
@Fryguy
Jun 21 2016 22:51
I should pair with you on this so you can see what I'm doing
but not today :) I'm done for the day
Keenan Brock
@kbrock
Jun 21 2016 22:52
Jason - pulling back more data to avoid n+1 is good until it is not
Jason Frey
@Fryguy
Jun 21 2016 22:52
In this case it's VmdbIndex records, so it's a known quantity of relatively little
but yeah, I agree with you in general
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:52
@Fryguy I would like that, and I would also see how you have things setup, but yeah, not today for sure
Jason Frey
@Fryguy
Jun 21 2016 22:53
and you have to deal with my DubStep in the background ;)
Keenan Brock
@kbrock
Jun 21 2016 22:53
Ok, I need to eat lunch - catch you all later
Nick LaMuro
@NickLaMuro
Jun 21 2016 22:54
wait... lunch?
Jason Frey
@Fryguy
Jun 21 2016 22:55
WAT