These are chat archives for ManageIQ/manageiq/performance

1st
Jun 2016
Keenan Brock
@kbrock
Jun 01 2016 01:26

@dmetzger57 @gmcculloug So for every keypress, ver 5.5 to 5.6 went from 56ms to 250ms
In 5.6, set_user_time_zone and set_gettext_locale alone take 62.8ms
All filters are 5 queries / 82ms (45% longer) and the logic hasn't run yet

The logic is 27 queries / 258ms more

I have a few ideas to remove ~5 queries, but it is starting to get aggressive and possibly too invasive for 5.6

Alex Krzos
@akrzos
Jun 01 2016 01:37
so thats the source of the additional menu latency?
Keenan Brock
@kbrock
Jun 01 2016 01:37
It could be
for any "micro" transactions, these additions ms really add up
also, looks like we're binding multiple times
haven't nailed it down, but sometimes a keypress (in a provisioning form) will trigger 4 events
@akrzos I'm focusing on provisioning javascript "you pressed a button" work right now
Alex Krzos
@akrzos
Jun 01 2016 01:38
Maybe you could fix the Red Hat customer portal right now while you're at it :smirk:
Keenan Brock
@kbrock
Jun 01 2016 01:38
but those filters are for every query
sure thing
@akrzos I know we like dynamic and all
but the amount of time it takes to look up the timezone for every request really adds up
almost wish we could hard code it and just change / reboot on daylight savings
Alex Krzos
@akrzos
Jun 01 2016 01:40
hmm, wonder how we ended up checking the timezone every request
Keenan Brock
@kbrock
Jun 01 2016 01:40
we look it up in settings on every request and set it for the user
you know
if they haven't overridden it
... sorry, little snark snuck through :( ...
also on every request, we check if the user is an admin, so we know if the "server is ready?"
I had tried to remove that before, but it broke quite a few tests
Alex Krzos
@akrzos
Jun 01 2016 01:44
I'm watching GC pauses on my xlarge provider on 5.6 right now
WARN -- : EMS [......] as [administrator@vsphere.local] Garbage collection took 7.738481634994969 seconds
Keenan Brock
@kbrock
Jun 01 2016 02:22
heh
@akrzos that is not good :(
Alex Krzos
@akrzos
Jun 01 2016 02:24
Better than when it was 60% of the time
Keenan Brock
@kbrock
Jun 01 2016 02:43

@Fryguy it looks like User.get_timezone is very expensive
Loggin as "admin", it is 50-60% of the time for the provisioning dialog keypress entry
116-140ms VS 300ms-400ms

Now for a non-admin, I assume other factors will enter in here

Joe Rafaniello
@jrafanie
Jun 01 2016 02:50
which part is slow?
settings.fetch_path(:display, :timezone) || self.class.server_timezone
Keenan Brock
@kbrock
Jun 01 2016 02:50
ok, server_timezone
I'm going to hack that a little
think I can get 20% back
Joe Rafaniello
@jrafanie
Jun 01 2016 02:50
didn't we talk about caching that?
Keenan Brock
@kbrock
Jun 01 2016 02:51

I think change:

MiqServer.my_server.server_timezone

to

get_config("vmdb").config.fetch_path(:server, :timezone) || "UTC"
I don't mention the word cache around you - you tend to be gunshy ;)
Joe Rafaniello
@jrafanie
Jun 01 2016 02:55
what's expensive? the my_server call or the server_timezone?
Keenan Brock
@kbrock
Jun 01 2016 02:55
still discovering
Joe Rafaniello
@jrafanie
Jun 01 2016 02:55
the former should already be cached after the first call
Keenan Brock
@kbrock
Jun 01 2016 02:55
took a while to get here
I guess
I think my_server is not cached...
Joe Rafaniello
@jrafanie
Jun 01 2016 02:56
Try hacks to test theories
diff --git a/app/models/miq_server.rb b/app/models/miq_server.rb
index 3ee8988..d43e2e1 100644
--- a/app/models/miq_server.rb
+++ b/app/models/miq_server.rb
@@ -625,7 +625,7 @@ class MiqServer < ApplicationRecord
   end

   def server_timezone
-    get_config("vmdb").config.fetch_path(:server, :timezone) || "UTC"
+    "UTC"
   end
if you're not sure if get_config is part of it
or instead, see if it's server_timezone that's the problem
diff --git a/app/models/user.rb b/app/models/user.rb
index 3249241..2eb9682 100644
--- a/app/models/user.rb
+++ b/app/models/user.rb
@@ -203,7 +203,7 @@ class User < ApplicationRecord
   end

   def get_timezone
-    settings.fetch_path(:display, :timezone) || self.class.server_timezone
+    settings.fetch_path(:display, :timezone) || "UTC"
   end
if it's server_timezone, try the first diff to hack it to see if it makes a difference removing some code
ok signing off, good luck @kbrock
Keenan Brock
@kbrock
Jun 01 2016 02:59
thanks
Joe Rafaniello
@jrafanie
Jun 01 2016 02:59
remember, hacks are ok when you're testing theories ;-)
Keenan Brock
@kbrock
Jun 01 2016 02:59
caching server_timezone
and it looks fast
Joe Rafaniello
@jrafanie
Jun 01 2016 02:59
eliminate code so you can narrow it down
Keenan Brock
@kbrock
Jun 01 2016 02:59
think it is the get_config... repeated
added @timezone ||= in MiqServer
looks much faster
Joe Rafaniello
@jrafanie
Jun 01 2016 02:59
ok
Keenan Brock
@kbrock
Jun 01 2016 02:59
we'll see
Joe Rafaniello
@jrafanie
Jun 01 2016 03:00
cool
Keenan Brock
@kbrock
Jun 01 2016 03:00
that doesn't feel so bad
we refresh MyServer very often
Joe Rafaniello
@jrafanie
Jun 01 2016 03:00
yeah, it's dumb, we ask for the timezone for the same server constantly
Keenan Brock
@kbrock
Jun 01 2016 03:00
I had thought we didn't cache my_server
forgot that it went in
or maybe that was something else
oh well
have a good night
Joe Rafaniello
@jrafanie
Jun 01 2016 03:00
Good night :moon:
Keenan Brock
@kbrock
Jun 01 2016 03:00
good night :dune:
Keenan Brock
@kbrock
Jun 01 2016 03:40
@gtanzillo @gmcculloug @dmetzger57 ==> ManageIQ/manageiq#9070
Got timing of provisioning keypresses down to 100ms.
@jrafanie thanks
I want that change in darga...
Keenan Brock
@kbrock
Jun 01 2016 11:53

@Fryguy Config is still expensive.

what do you think about:

Config.new("vmdb").fetch_path(:a, :b)
# vs
Config.fetch_path("vmdb", :a, :b)

Where there is a singleton instance of config that is thread local (reset on connect
AND/OR there is a config that times out every 1 second

Jason Frey
@Fryguy
Jun 01 2016 12:17
Config is known to be expensive, particularly because it dups the whole hash
We have to go through the code base and start removing it... aka phase 2
But yeah, you found that bug also which I have to fix... That should make it a little less painful
Keenan Brock
@kbrock
Jun 01 2016 12:33
In the long term, do you like something like Config.instance.fetch_path("vmdb, :a, :b)
@Fryguy whoa, there are only 90 (app) + 15 (lib) instances of Config.new
That is totally changeable
Alex Krzos
@akrzos
Jun 01 2016 13:56
Guy I did open a few more bugs last night - https://bugzilla.redhat.com/show_bug.cgi?id=1341402
https://bugzilla.redhat.com/show_bug.cgi?id=1341386 - This one will be helpful if we can figure out the source of increased memory for 5.6
Joe Rafaniello
@jrafanie
Jun 01 2016 14:23
@akrzos would love to see the results with a single message catalog ;-)
:pray:
Alex Krzos
@akrzos
Jun 01 2016 14:24
Thanks for the reminder, let me kick the tires on that
5.6.0.8 right?
Joe Rafaniello
@jrafanie
Jun 01 2016 14:24
any version of 5.6
just remove all the non-english catalogs
Jason Frey
@Fryguy
Jun 01 2016 14:24
@kbrock I'm on it
I don't prefer using Config at all in the future...but short term there is a simple fix
Joe Rafaniello
@jrafanie
Jun 01 2016 14:26
If the newer bundler caused a drop in overall memory in 5.6.04 -> 5.6.0.5, I'd imagine loading message catalogs lazily would also benefit the whole appliance setup in spite of fork and shared memory
Alex Krzos
@akrzos
Jun 01 2016 14:53
@jrafanie So I have an appliance deployed now without those .po files, running my idle memory baseline against it next
@dmetzger57 Have you checked the comparsion dashboard for the single vs duel worker appliance setup? I'm seeing around 570MiB difference between the two appliances now
Joe Rafaniello
@jrafanie
Jun 01 2016 15:01

@jrafanie So I have an appliance deployed now without those .po files, running my idle memory baseline against it next

:+1:

Keenan Brock
@kbrock
Jun 01 2016 15:51
@Fryguy looks like swapping Settings in there looks pretty easy
also, notice a number of places do my_server.server_timezone - maybe we'd want to go directly to Settings
cap&u and refresh both (at least used to) have performance issues around looking up settings. Thought you had resolved it but in light of this current one, I wonder
Jason Frey
@Fryguy
Jun 01 2016 15:52
The trick there is you still want... MiqServer.find(other_id).server_timezone to work on the remote settings
The PR I pushed should handle all of those cases as well
ManageIQ/manageiq#9085
But yes, deifnitely where we literally call my_server.server_timezone, that can be straight replaced with Settings.server.timezone
@kbrock Can you try the change in my PR and see if you get the same performance benefits to your PR?
Keenan Brock
@kbrock
Jun 01 2016 15:55
+1 thanks
@Fryguy what do you want me to test?
this is not caching the config
so we'll still be spending time parsing / loading / duping it
you want me to test this WITHOUT my change?
Keenan Brock
@kbrock
Jun 01 2016 16:00
also, @Fryguy does Vmdb::Config#save look like a race condition to you?
Alex Krzos
@akrzos
Jun 01 2016 16:52
@jrafanie ~250MiB Saved with removing those .po files
during idle
Dennis Metzger
@dmetzger57
Jun 01 2016 16:53
winner!
Joe Rafaniello
@jrafanie
Jun 01 2016 16:53
is that with config/locales/en/manageiq.po or are all of the po files gone?
Alex Krzos
@akrzos
Jun 01 2016 16:55
Just these three:
.//config/locales/ja/manageiq.po
.//config/locales/nl/manageiq.po
.//config/locales/zh_CN/manageiq.po
Joe Rafaniello
@jrafanie
Jun 01 2016 16:55
wow, so that's exactly what the lazy mo/po PR should do
;-)
how many evm_server + worker processes?
Oleg Barenboim
@chessbyte
Jun 01 2016 16:56
when is that PR landing?
Joe Rafaniello
@jrafanie
Jun 01 2016 16:56
Milan and Martin H are helping me undo my hack that prevents the force load all of the catalogs ManageIQ/manageiq#8525
it can be merged now if we don't want the pulldown of locales to be human friendly ;-)
@akrzos is that pss or rss? how much saved per process?
Alex Krzos
@akrzos
Jun 01 2016 16:59
@jrafanie I updated the idle memory gdoc
I mad eit 5.6.0.8-NLP (No language pack, I thought that was what those were called)
Joe Rafaniello
@jrafanie
Jun 01 2016 17:00
naming is hard... I call them message catalogs...
awesome, the gdoc looks nicer now
Alex Krzos
@akrzos
Jun 01 2016 17:01
I moved the 5.6 entries with websocket worker to archived tab
since can't really compare those
Keenan Brock
@kbrock
Jun 01 2016 17:03
@akrzos my report stats has 3 numbers, time in ruby, time in sql, time to convert sql to ruby (instantiation time)
Joe Rafaniello
@jrafanie
Jun 01 2016 17:03
only 130 MB to go to match 5.5.x
Keenan Brock
@kbrock
Jun 01 2016 17:03
@jrafanie yay
but I want to only show 2 numbers ms (ruby) and sqlms (in sql)
should the instantiation time be in sql?
Alex Krzos
@akrzos
Jun 01 2016 17:04
Not sure I understand
I prefer recording all data points and downsampling if needed such as the idle memory (Compare used appliance memory then dig into Per Worker RSS/PSS memory as needed)
Keenan Brock
@kbrock
Jun 01 2016 17:05
ms ms- sql sqlms sqlrows comments
12.2 1 0.4 rbac-vm-1
12.2 12.2 1 0.4 http://:
0.4 0.4 .SELECT COUNT(*) FROM "vms" WHERE "vms"."type" IN ('Vm', 'ManageIQ::Providers::CloudManager::Vm', 'ManageIQ::Providers::InfraManager::Vm', 'VmServer', 'ManageIQ::Providers::Azure::CloudManager::Vm', 'ManageIQ::Providers::Google::CloudManager::Vm', 'ManageIQ::Providers::Openstack::CloudManager::Vm', 'ManageIQ::Providers::Amazon::CloudManager::Vm', 'ManageIQ::Providers::Microsoft::InfraManager::Vm', 'ManageIQ::Providers::Redhat::InfraManager::Vm', 'ManageIQ::Providers::Vmware::InfraManager::Vm', 'VmXen') AND "vms"."template" = $1 [false]
I'm annoyed sqlrows is not 1 for that one (current bug)
so for that query, there is no time spent instantiating the active record objects
and looking at that, you wonder, where is that 12.2 ms?
ms ms- sql sqlms sqlrows comments
588.5 1 580.7 19 rbac-vm-1
588.5 588.5 1 580.7 19 http://:
580.7 0.6 19 .SELECT "vms".* FROM "vms" []
so the query takes 0.6 ms, but takes 580.7 seconds to fetch the data and instantiate the objects
should the sqlms show 580.7 or 0.6?
I had 580.7, but am unsure
bookend("rbac-vm", 1, "gen") { ActiveRecord::Base.connection.cache { puts VmOrTemplate.all.to_a.count ; VmOrTemplate.all.to_a ; VmOrTemplate.all.to_a ; nil } }
ms ms- sql sqlms sqlrows comments
8.3 3 5.2 57 rbac-vm-1
8.3 8.3 3 5.2 57 http://:
2.2 0.5 19 .SELECT "vms".* FROM "vms" []
1.5 19 .SELECT "vms".* FROM "vms" []
1.5 19 .SELECT "vms".* FROM "vms" []
because for loading from the cache, I want to show that it is still processing that data
and taking that time
right now we're punting on load from cache
(right now we're punting on a bunch of stuff :()
Alex Krzos
@akrzos
Jun 01 2016 17:13
ok so even though things are cached it is still slow? (Is that the gist?)
Keenan Brock
@kbrock
Jun 01 2016 17:14
when I'm trying to track stuff down
I want to know what is spending the time
but if I show THAT - it looks like it made a query
but we don't know where the 8.3 came from
want to account for everything, but not show a 100 column table
Keenan Brock
@kbrock
Jun 01 2016 17:31
@akrzos did you get a chance to test a cap&u using that 1 liner change (connection.cache {})?
Alex Krzos
@akrzos
Jun 01 2016 17:32
@kbrock I thought that was this guy: https://github.com/ManageIQ/manageiq/pull/8948/files
Keenan Brock
@kbrock
Jun 01 2016 17:32
yes
Alex Krzos
@akrzos
Jun 01 2016 17:32
I haven't built an appliance with that patch yet
though
Keenan Brock
@kbrock
Jun 01 2016 17:32
busy guy
Alex Krzos
@akrzos
Jun 01 2016 17:32
maybe a good canidate would be my statsd monitored appliance
Well busy yes, but also making sure we build something comparable
I'm thinking we run that on the appliance i have statsd running and we can see the difference in deliver timings on perf_captures etc
and maybe all messages
if we compare before after patch
does that sound reasonable?
Joe Rafaniello
@jrafanie
Jun 01 2016 17:48
bundler/bundler#4618
@akrzos if the 3 po files gives us back ~250 MB on appliances, if we fix bundler to not always re-resolve if you have path based gems, I'd expect similar savings
Alex Krzos
@akrzos
Jun 01 2016 19:14
Nice, so if we fix bundler it won't load those po files and give us similiar savings or another ~250MiB per appliance in addition to the not loading .po files? (That would be fantastic)
Joe Rafaniello
@jrafanie
Jun 01 2016 19:53
@akrzos well, the bundler problem manifests as various things created in the bunder/index.rb
m:     280,664 | c:   4,969 | Array /Users/joerafaniello/.gem/ruby/2.2.5/gems/bundler-1.12.5/lib/bundler/index.rb:158
m:     371,009 | c:   4,092 | String  /Users/joerafaniello/.gem/ruby/2.2.5/gems/bundler-1.12.5/lib/bundler.rb:395
m:     591,568 | c:   3,838 | Hash  /Users/joerafaniello/.gem/ruby/2.2.5/gems/bundler-1.12.5/lib/bundler/index.rb:23
m:     149,114 | c:   3,714 | String  /Users/joerafaniello/.gem/ruby/2.2.5/gems/bundler-1.12.5/lib/bundler/index.rb:23
m:     744,360 | c:   3,031 | Hash  /Users/joerafaniello/.gem/ruby/2.2.5/gems/bundler-1.12.5/lib/bundler/index.rb:155
it annoys me because it causes bundle commands to be 1 second slower... this is also happens when loading rails since it requires bunder/setup
03:54:41 ~/Code/manageiq (master) (2.2.5) + time bundle check
Resolving dependencies....
The Gemfile's dependencies are satisfied
bundle check  1.18s user 0.07s system 99% cpu 1.263 total
03:54:57 ~/Code/manageiq (master) (2.2.5) +
03:54:58 ~/Code/manageiq (master) (2.2.5) + git diff
diff --git a/Gemfile b/Gemfile
index 9c8f0fc..9d7f5ec 100644
--- a/Gemfile
+++ b/Gemfile
@@ -12,10 +12,10 @@ gem "config",                          "~>1.1.0", :git => "git://github.com/Fryg
 gem "deep_merge",                      "~>1.0.1", :git => "git://github.com/Fryguy/deep_merge.git", :branch => "overwrite_arrays"

 # Local gems
-path "gems/" do
-  gem "manageiq_foreman", :require => false
-  gem "manageiq-providers-amazon"
-end
+# path "gems/" do
+#   gem "manageiq_foreman", :require => false
+#   gem "manageiq-providers-amazon"
+# end

 # Client-side dependencies
 gem "angular-ui-bootstrap-rails",     "~>0.13.0"

03:55:04 ~/Code/manageiq (master) (2.2.5) + time bundle check
The Gemfile's dependencies are satisfied
bundle check  0.32s user 0.05s system 98% cpu 0.380 total