These are chat archives for ManageIQ/manageiq/performance

8th
Feb 2016
Jason Frey
@Fryguy
Feb 08 2016 15:00
@kbrock Not sure why you unWIPed my PR...it's still WIP
Keenan Brock
@kbrock
Feb 08 2016 15:00
ok
I'll wip it good again
Jason Frey
@Fryguy
Feb 08 2016 15:00
I think it doesn't handle certain cases like non rooted trees, which I think there aren't tests for
so I plan to get that verified today
Keenan Brock
@kbrock
Feb 08 2016 15:01
mutant FTW
Jason Frey
@Fryguy
Feb 08 2016 15:01
It's not a mutant case in this case, though...it's straight up code paths that are not tested at all
e.g. calling subtree.arranged on a node in the middle of the tree (as opposed to from the root of a tree)
Keenan Brock
@kbrock
Feb 08 2016 15:06
@Fryguy you have an opinion on stefankroes/ancestry#262 code wise?
Tons of requests for this (and making the id pattern configurable)
Jason Frey
@Fryguy
Feb 08 2016 15:13
The only thing I'm not sure about is the switch from a constant Regex to one that's dynamically computed every invocation
It can probably be memoized on access, and cleared on ancestry_delimeter=
otherwise, seems fine
also, not sure why you added a helper method get_ancestry_delimiter, but didn't use it everywhere you could (but that's minor)
Keenan Brock
@kbrock
Feb 08 2016 15:40
I'm very confused by the accessor and where that stuff is accessed
it is all over the place
I just copied the data access they modeled
BZ monday - amusing
there is a BZ for anything I can think of doing.
Jared Deubel
@jdeubel
Feb 08 2016 16:11
@dmetzger57 : Do you know how to deploy these diff files for intuit?
Alex Krzos
@akrzos
Feb 08 2016 16:13
@jdeubel can you just patch < file in the right directory to apply a patch?
Keenan Brock
@kbrock
Feb 08 2016 16:16
@akrzos yea, think it is vmdb ; patch -p2 < file
Jared Deubel
@jdeubel
Feb 08 2016 16:24
This message was deleted
Dennis Metzger
@dmetzger57
Feb 08 2016 16:34
@jdeubel did you get the changes applied?
Jared Deubel
@jdeubel
Feb 08 2016 16:40
@dmetzger57 not yet
Jason Frey
@Fryguy
Feb 08 2016 17:04
@kbrock I got a test and I definitely broke the arrangement of a node in the middle of a tree...fixing now
Keenan Brock
@kbrock
Feb 08 2016 17:04
thanks jason
hmm
@Fryguy there is a fix out there around arrangement
Jason Frey
@Fryguy
Feb 08 2016 17:04
a fix?
Keenan Brock
@kbrock
Feb 08 2016 17:05
stefankroes/ancestry#263
not sure if this is related
but may have some inpact
it is an old PR (3 years old) I just brought back to life
but it is about updating the ancestry id in the middle of a tree
is that what you did?
Jason Frey
@Fryguy
Feb 08 2016 17:05
I don't see anything about arrangement in your PR there
Keenan Brock
@kbrock
Feb 08 2016 17:06
PR is from June 10,2011
I just updated it
don't know if it affects your stuff
but it seems like it will affect (fix) updates in the middle
Jason Frey
@Fryguy
Feb 08 2016 17:07
nope..completely unrealted
Keenan Brock
@kbrock
Feb 08 2016 17:07
ok - cool
Jason Frey
@Fryguy
Feb 08 2016 17:07
my problem is not updates in the middle...it's arranged_nodes when called on a node in the middle
Keenan Brock
@kbrock
Feb 08 2016 17:07
aah - ok
Keenan Brock
@kbrock
Feb 08 2016 19:00
what is Storage? it sure likes to run queries for rollups
Jason Frey
@Fryguy
Feb 08 2016 19:02
Storage is storage :sweat_smile:
like disks and shares and stuff
Keenan Brock
@kbrock
Feb 08 2016 19:02
the disk where the vm is located?
do we roll up performance by this a lot?
it has many possible Ems in it, so every storge we reference, it is looking up ems, zone, region - seems to
it is hundreds of queries for that stuff.
Jason Frey
@Fryguy
Feb 08 2016 19:04
i think it can hold disks but also shared storage like NFS shares and the like
Keenan Brock
@kbrock
Feb 08 2016 19:04
perf_capture_timer is spending most of its time looking up the zone / region for storage
when putting into the queue
Jason Frey
@Fryguy
Feb 08 2016 19:05
storage is cross EMS so it's a little different
Keenan Brock
@kbrock
Feb 08 2016 19:05
I only have 1 zone, 1 region, and 1 ems, and 13 storages
100 Vms, 10 Hosts
for rollups it needs to do 381 queries, most of which are trying to figure out which server / zone / region
and to some extent which ems
85% of the time is spent on that
they can't be lazy loaded because storage.has_many :ems
so we also get side effects like: cached_config_valid? is called 148 times
Keenan Brock
@kbrock
Feb 08 2016 19:13
ok - thanks. found a few things
Keenan Brock
@kbrock
Feb 08 2016 20:28
@Fryguy how is the configuration stuff going? I've noticed that looking up capture_threshold is taking 158 queries / or 30-50% of the total time
Jason Frey
@Fryguy
Feb 08 2016 20:50
It's going well
too bad I'm on sev 1 performance :P
Keenan Brock
@kbrock
Feb 08 2016 20:51
yea
otherwise we could get the performance benifit of that one :(
Jason Frey
@Fryguy
Feb 08 2016 20:54
well, we will...eventually
Keenan Brock
@kbrock
Feb 08 2016 20:56
@Fryguy ok - so I heard you say not within the week/month - so I need to fix this issue.
thanks
Jason Frey
@Fryguy
Feb 08 2016 20:57
I did not say that
If I can get back to it, and dedicate my time, maybe order of weeks (hopefully less)
but none of it can be backported anyway
Keenan Brock
@kbrock
Feb 08 2016 20:58
Config.new("VMDB").new / config_mtime_from_db are expensive especially when the values are read in a loop
Jason Frey
@Fryguy
Feb 08 2016 20:59
I woud ignore that completely for now, but I don't know if this is for a customer or something
Keenan Brock
@kbrock
Feb 08 2016 20:59
um
yea
perf_capture_timer is timing out
Jason Frey
@Fryguy
Feb 08 2016 20:59
and Config.new is the reason why?
Keenan Brock
@kbrock
Feb 08 2016 20:59
and in a spiral
Jason Frey
@Fryguy
Feb 08 2016 21:00
more likely it's something else in the querying/bringing back lots of records
Keenan Brock
@kbrock
Feb 08 2016 21:00
looking up that variable (for storages in specific) is causing serious issues
Jason Frey
@Fryguy
Feb 08 2016 21:00
so don't read it in a loop?
Keenan Brock
@kbrock
Feb 08 2016 21:00
easy
Jason Frey
@Fryguy
Feb 08 2016 21:00
cache_with_timeout at the class level? lots of choices
Keenan Brock
@kbrock
Feb 08 2016 21:00
I didn't create mixins that go everywhere
and make classes big spagetti
no
Jason Frey
@Fryguy
Feb 08 2016 21:00

and Config.new is the reason why?

what percent of time?

Keenan Brock
@kbrock
Feb 08 2016 21:01
20% of the total number of queries
I'll trim down to just this one change and run benchmarks again
Jason Frey
@Fryguy
Feb 08 2016 21:02
I don't understand why it would cause queries over and over
because the configuration is cached
Keenan Brock
@kbrock
Feb 08 2016 21:02
storage needs to load a different ems
Jason Frey
@Fryguy
Feb 08 2016 21:02
it's cached globally
Keenan Brock
@kbrock
Feb 08 2016 21:02
every time you load a miq_server - you load a new one
it passes in a (true)
Jason Frey
@Fryguy
Feb 08 2016 21:02
doesn't matter...it's cached globally
Keenan Brock
@kbrock
Feb 08 2016 21:03
I'll go back and revisit
Jason Frey
@Fryguy
Feb 08 2016 21:03
unless it's not doing VMDB::Config.new("vmdb")
like if it's doing MiqServer#get_config, that's different
Keenan Brock
@kbrock
Feb 08 2016 21:03
it is doing VMDB::Config.new("vmdb")
Jason Frey
@Fryguy
Feb 08 2016 21:04
I'm not getting as much gains as I expected from the arrange_nodes change
Keenan Brock
@kbrock
Feb 08 2016 21:04
:(
Jason Frey
@Fryguy
Feb 08 2016 21:05
I expected about 20%...only getting about 9%
so UI code went from 10.2 seconds to 8.7 seconds in dev mode on my machine
going to have @jrafanie try in production mode
Oleg Barenboim
@chessbyte
Feb 08 2016 21:07
not terribly exciting numbers
Jason Frey
@Fryguy
Feb 08 2016 21:07
yeah, I don't understand why
maybe because we already got some gains from other stuff
I thought @jrafanie said that arrange_nodes was 40% of the time
I can profile before/after...I ran the UI without profiling
Keenan Brock
@kbrock
Feb 08 2016 21:08

always run in prod mode
@jrafanie

always ignore first run
@kbrock

Jason Frey
@Fryguy
Feb 08 2016 21:09
curious how much dev mode makes a difference percentage-wise
I didn't take first run...I took average of 3rd-5th run
Keenan Brock
@kbrock
Feb 08 2016 21:09
yea
Jason Frey
@Fryguy
Feb 08 2016 21:10
I came late to the game...I wonder what the total saving is with all 3 changes
I guess I can move my git HEAD back before we changed stuff
Keenan Brock
@kbrock
Feb 08 2016 21:21

@Fryguy I may be making a mistake, but I've run this dozens of times

diff --git a/app/models/metric/capture.rb b/app/models/metric/capture.rb
index 51e65aa..d148386 100644
--- a/app/models/metric/capture.rb
+++ b/app/models/metric/capture.rb
@@ -71,6 +71,7 @@ module Metric::Capture
   def self.capture_threshold(target)
+    return 50.seconds.ago
     key, default = MiqAlert.target_needs_realtime_capture?(target) ? [:capture_threshold_with_alerts, 1] : [:capture_threshold, 10]

869ms => 391ms

oops
Jason Frey
@Fryguy
Feb 08 2016 21:21
LOL
Keenan Brock
@kbrock
Feb 08 2016 21:21
fail
lol
guest pasting tables doesn't work
tired of reformatting these results :)
Jason Frey
@Fryguy
Feb 08 2016 21:22
that's what @jrafanie was trying to do when you sent him to blocks.io
Keenan Brock
@kbrock
Feb 08 2016 21:22
lol
right
Keenan Brock
@kbrock
Feb 08 2016 21:29
name self self+child offset sql sql time after self self+child offset sql sql time
executing perf_capture_timer 1.3 838.3 +0.0 1 0.2 + 1.4 391.5 +0.0 1 0.3
+executing p_c_health_check 5.0 5.0 +2.0 1 0.4 + 5.0 5.0 +1.0 1 0.5
+executing c_t 0.1 41.3 +7.0 + 0.1 42.9 +7.0
++executing c_in_t 23.9 41.1 +7.0 9 2.4 + 25.6 42.7 +7.0 9 2.5
+++executing c_vm_t 15.3 15.3 +32.0 1 0.9 + 15.3 15.3 +34.0 1 1.2
+executing c_trg_by_parent 33.8 33.8 +48.0 20 2.7 +
+executing c_tsk_by_parent 10.5 10.5 +82.0 8 1.2 + 10.0 10.0 +50.0 8 1.2
+executing queue_captures 740.3 740.3 +92.0 403 66.8 + 324.9 324.9 +60.0 265 47.2
+executing delete_older 6.0 6.0 +833.0 5 0.9 + 6.7 6.7 +385.0 4 1.0
that 1 line did all that
not sure why c_trg_by_parent is not showing up, so there may be something else
Keenan Brock
@kbrock
Feb 08 2016 21:35
I put a puts in there, and it is called, but it must leave so quickly that it doesn't have the chance to register.
Joe Rafaniello
@jrafanie
Feb 08 2016 21:53
Jason tested a 9% savings with the arrange_nodes changes on 5.4, I'm seeing nearly 7% on master
Keenan Brock
@kbrock
Feb 08 2016 21:53
well, that is something?
Joe Rafaniello
@jrafanie
Feb 08 2016 21:53
on master, we're spending about 2.3 seconds in that method, @Fryguy's change saves 0.7 seconds of that time
yeah, draper change next
at least on master
not much else we can do on 5.4
Keenan Brock
@kbrock
Feb 08 2016 21:54
yea
Jason Frey
@Fryguy
Feb 08 2016 21:54
This message was deleted
Keenan Brock
@kbrock
Feb 08 2016 21:54
I'm showing looking up "is perf capture enabled" and "how far back should we go" to be a majority of time. for 100 VMs, it is like 75% of the time
hmm. maybe closer to 50%
anyone object to me moving ci_capture mixin out of all objects? - it only really using self.class.name from the object. Seems it could live nicely in the capture logic itself. and we could group things better to not look up zone / ems so often
Jason Frey
@Fryguy
Feb 08 2016 21:57
doesn't it provide the perf_capture method?
Keenan Brock
@kbrock
Feb 08 2016 21:59
guess I just care about perf_capture_queue
and queue_name_for_metrics_collection
Jason Frey
@Fryguy
Feb 08 2016 21:59
well, perf_capture_queue is the async version of perf_capture, so that should live side-by-side with it
Keenan Brock
@kbrock
Feb 08 2016 22:00
ok, I'll pass in the vars that I don't want it to look up again
Jason Frey
@Fryguy
Feb 08 2016 22:51
Updated my PR with a TON of specs: stefankroes/ancestry#261
Keenan Brock
@kbrock
Feb 08 2016 22:57
thanks
Chris Arcand
@chrisarcand
Feb 08 2016 22:58
Nice :+1: :+1:
Jason Frey
@Fryguy
Feb 08 2016 22:59
actually, I made the tests nicer (readable)...let me just clean that up
Keenan Brock
@kbrock
Feb 08 2016 22:59
ok
so the nodes came back where the parents came after the children
this is because you brought them back without sorting them first?
Jason Frey
@Fryguy
Feb 08 2016 23:00
I did not change the sorting....but my code doesn't care about sorting
so that's a possible next optimization
but I don't think it will save much for our purposes
Keenan Brock
@kbrock
Feb 08 2016 23:00
yea
did you find a bug?
was it the parent thing that was a bug?
Jason Frey
@Fryguy
Feb 08 2016 23:01
in my original rewrite there was a but
it was not the parent thing you mentioned
Joe Rafaniello
@jrafanie
Feb 08 2016 23:01
these are not the parents you are looking for
Keenan Brock
@kbrock
Feb 08 2016 23:08
+1