These are chat archives for ManageIQ/manageiq/performance

17th
Nov 2015
Keenan Brock
@kbrock
Nov 17 2015 13:21
@Fryguy @jrafanie you remember the rails app that displays the postgres statistics?
I also found https://github.com/heroku/heroku-pg-extras - but was looking for the rails app that does this
Alex Krzos
@akrzos
Nov 17 2015 13:22
@kbrock was it pghero?
Keenan Brock
@kbrock
Nov 17 2015 13:22
yes!
as always @akrzos you are my hero
Alex Krzos
@akrzos
Nov 17 2015 13:22
hah, Thanks!
Keenan Brock
@kbrock
Nov 17 2015 13:22
do you enable stats on your rails apps?
Alex Krzos
@akrzos
Nov 17 2015 13:22
:smiley:
Keenan Brock
@kbrock
Nov 17 2015 13:22
so we could point to a box and say "here are the outliers?"
Alex Krzos
@akrzos
Nov 17 2015 13:30
@kbrock I can add that to the scale setup to start seeing if things pop up easily
just gotta see how easy it will be to get stood up
Keenan Brock
@kbrock
Nov 17 2015 13:30
yea - not the tool in particular, but the stats
you have to modify the database itself
Alex Krzos
@akrzos
Nov 17 2015 13:30
oh
Keenan Brock
@kbrock
Nov 17 2015 13:30
not sure the performance implications
this is definitely a tool you want under your belt
not the rails app per say
the stats that are collected
Alex Krzos
@akrzos
Nov 17 2015 13:31
understood, well we have some vmdb metrics already captured
not sure if anyone or anybody does anything with it
Keenan Brock
@kbrock
Nov 17 2015 13:31
cool
I'm having a lot of fun with cap and u right now
taking me a while to capture the metrics like you guys were doing all last month. sorry taking time to catch up to speed
Joe Rafaniello
@jrafanie
Nov 17 2015 18:37
hey @gtanzillo, remember how slow MiqReport#generate_table was yesterday?
I found a thing with Ruport
going to do a PR... tldr: Report::Data::Record.new allocates objects like nobody's business
Alex Krzos
@akrzos
Nov 17 2015 18:39
So interesting enough, I turned on only a single priority worker on the scale setup and the maximum memory I'm seeing now is at 3.3g rss
Dennis Metzger
@dmetzger57
Nov 17 2015 18:40
rather than the 6Gb?
Alex Krzos
@akrzos
Nov 17 2015 18:40
where as with 1 generic and 1 priority, I saw 6g
yes
not sure if it needs more time but almost 24 hours and it's pretty leveled out
Joe Rafaniello
@jrafanie
Nov 17 2015 18:41
@gtanzillo with that "all departments" report and @akrzos's db... time be rails r "r = MiqReport.find(104); r.generate_table" goes from around 31.5 -> 28.5 and drops memory on my machine at least 50 MB
Alex Krzos
@akrzos
Nov 17 2015 18:42
Screen Shot 2015-11-17 at 13.41.54.png
Gregg Tanzillo
@gtanzillo
Nov 17 2015 18:45
Wow @jrafanie nice find! Ruport hasn’t changed in years afik so I’m guessing this has been around for a long time
Joe Rafaniello
@jrafanie
Nov 17 2015 18:59
yes, it has
Greg Blomquist
@blomquisg
Nov 17 2015 19:18
@Fryguy, @jrafanie, @dmetzger57 can you guys look at comment #2 here https://bugzilla.redhat.com/show_bug.cgi?id=1282819
Tom H doesn't believe this is related to the fix from @dmetzger57 for https://bugzilla.redhat.com/show_bug.cgi?id=1243938
don't know if the db locks Tom H is referring to rings any bells
Dennis Metzger
@dmetzger57
Nov 17 2015 19:20
other than than the ringing in my ear, no bells here.
Greg Blomquist
@blomquisg
Nov 17 2015 19:21
haha
Joe Rafaniello
@jrafanie
Nov 17 2015 19:22
yeah, no idea what locks he's referring to
@gtanzillo @kbrock ruport/ruport#34
Matthew Draper
@matthewd
Nov 17 2015 19:27
process 10519 acquired RowExclusiveLock on relation 17033 of database 16386 after 279521.720 ms
Gregg Tanzillo
@gtanzillo
Nov 17 2015 19:29
Nice job on th eruport PR @jrafanie. Huge improvement for such a small and safe change.
Greg Blomquist
@blomquisg
Nov 17 2015 19:29
@matthewd most often it looks like that RowExclusiveLock is around the miq_queue table
oh wow, I misread that ... 279s, not 27.9s
all decimals matter
Joe Rafaniello
@jrafanie
Nov 17 2015 19:42
@gtanzillo we can monkey patch that initialize or fork ruport again if need be
Jason Frey
@Fryguy
Nov 17 2015 19:43
Where is the ruport PR?
Curious on the change but I don't see it
Gregg Tanzillo
@gtanzillo
Nov 17 2015 19:43
ruport/ruport#34
Jason Frey
@Fryguy
Nov 17 2015 19:44
Ohhhh on ruport itself
Keenan Brock
@kbrock
Nov 17 2015 20:08
the merge change is great, what about removing the dub - is that safe?
Jason Frey
@Fryguy
Nov 17 2015 20:09
@jrafanie you said that change saved 50 MB? I'm amazed
Keenan Brock
@kbrock
Nov 17 2015 20:09
I had thought about asking for us to maintain that gem, but we are trying to go away from it. only use it for the data structures. (their last commit was from february)
well, we point to a fork of that gem. so we can make it there. no?
I had wanted for us to point closer to their master. Pretty sure I got all of our changes into their master. Not sure if that means into their published gem though.
Joe Rafaniello
@jrafanie
Nov 17 2015 20:26
Weird, gitter on android wonks the URL to the ruport PR
looks fine here though
Yeah, we can do the change on our fork

you said that change saved 50 MB? I'm amazed

Yes, it's very random based on your current heap size, free slots, etc.

if GC is going to run or not
The "all departments" report with @akrzos's database allocated almost 13.1 million objects, of which the "inject... merge" line were some of the biggest offenders
661019    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb:168    T_STRIN    String
616595    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/relation/query_methods.rb:66    T_ARRAY    Array
448289    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:434    T_ARRAY    Array

284571    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/ruport-3ba69305f42e/lib/ruport/data/record.rb:58    T_HASH    Hash
243186    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/result.rb:110    T_STRING    String
224143    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:434    T_DATA    RubyVM::Env
224142    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:434    T_DATA    Proc

223818    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb:162    T_STRIN    String
223713    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:429    T_STRING    String
217420    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activesupport/lib/active_support/time_with_zone.rb:139    T_STRING    String
177378    /Users/joerafaniello/.gem/ruby/2.2.3/gems/psych-2.0.15/lib/psych.rb:376    T_STRING    String
175616    /Users/joerafaniello/.gem/ruby/2.2.3/gems/psych-2.0.15/lib/psych/scalar_scanner.rb:83    T_STRING    String
173253    /Users/joerafaniello/.gem/ruby/2.2.3/gems/psych-2.0.15/lib/psych/tree_builder.rb:75    T_OBJECT    Psych::Nodes::Scalar
172506    /Users/joerafaniello/.gem/ruby/2.2.3/gems/psych-2.0.15/lib/psych/scalar_scanner.rb:40    T_NODE
139398    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/rails-a37d0e57571f/activesupport/lib/active_support/dependencies.rb:457    T_NODE
138522    /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/ruport-3ba69305f42e/lib/ruport/data/record.rb:58    T_HASH
Keenan Brock
@kbrock
Nov 17 2015 20:31
@jrafanie did you put that PR into manageiq/ruport? that is the version we are using
Joe Rafaniello
@jrafanie
Nov 17 2015 20:32
and finally, down further is the hash key strings getting dup'd:
40509 /Users/joerafaniello/.gem/ruby/2.2.3/bundler/gems/ruport-3ba69305f42e/lib/ruport/data/record.rb:58 T_STRING String
No, I thought we were on released upstream or tracking a branch, I can do a PR on our fork
I suppose the hash key freeze is really the only part of the PR that's questionable since it's not really that big a deal... the 285k and 139k hashes created on record.rb:58 are the real win
Joe Rafaniello
@jrafanie
Nov 17 2015 20:44
ok, @kbrock @Fryguy opened a PR on our fork: ManageIQ/ruport#1
Keenan Brock
@kbrock
Nov 17 2015 20:45
I wouldn't mind us rebasing our repo to match upstream ruport... but again, I'm told we are going away from that.
Joe Rafaniello
@jrafanie
Nov 17 2015 20:45
yeah, I agree, this was just too big a problem to leave unfixed
Alex Krzos
@akrzos
Nov 17 2015 20:56
@Fryguy I got that memory usage + log lines for the priority worker for you, I'll add that to the bz, however the worker hasn't grown as much as it had before (3Gb vs 6Gb), I can tune the number of workers back to 1 generic and 1 priority and see if it increases in memory again
Keenan Brock
@kbrock
Nov 17 2015 21:03
zmore serverlog.gz
# feed me z more
@blomquisg you have a grep regex to point to what TomH found in that file?
there is contention everywhere in that database
Greg Blomquist
@blomquisg
Nov 17 2015 21:13
@kbrock all I searched for so far is grep -i -A1 lock postgresql.log | less
Keenan Brock
@kbrock
Nov 17 2015 21:13
the queries that are locked look like they should hit an index too
darn - thanks
Joe Rafaniello
@jrafanie
Nov 17 2015 21:45
Ok, I got more concrete numbers on ManageIQ/ruport#1
I did the same MiqReport#generate_table on a few slow reports before and after
before
Time elapsed |      RSS change |       Total RSS | Allocated Objects |                         Description
 22.481769 |       146337792 |       341491712 |        15235399 |           33 - Host Summary for VMs
 33.970649 |        29888512 |       371380224 |        21535819 |       53 - Folder VMs Relationships
 17.848895 |         1949696 |       373329920 |        11623846 |  60 - Summary - VMs migration ready
 15.259537 |        19046400 |       392376320 |         9986093 | 62 - Detailed - VMs migration ready
 19.187577 |            4096 |       392380416 |        13005036 |               88 - VM Relationships
 45.467005 |         2138112 |       394518528 |        29091805 |    89 - Folder to VMs Relationships
 22.587697 |       130605056 |       525123584 |        13494862 | 104 - All Departments with Performance
  11.07995 |       -31735808 |       493387776 |         8015645 |         107 - Top Storage Consumers
after
Time elapsed |      RSS change |       Total RSS | Allocated Objects |                         Description
 22.646629 |       121155584 |       315072512 |        15073316 |           33 - Host Summary for VMs
 34.076176 |        33177600 |       348250112 |        21451820 |       53 - Folder VMs Relationships
 18.487667 |         9924608 |       358174720 |        11515845 |  60 - Summary - VMs migration ready
 15.227479 |        29134848 |       387309568 |         9830093 | 62 - Detailed - VMs migration ready
 19.270113 |         -954368 |       386355200 |        12879036 |               88 - VM Relationships
 45.241741 |          835584 |       387190784 |        28923805 |    89 - Folder to VMs Relationships
 19.746829 |        81096704 |       468287488 |        13008670 | 104 - All Departments with Performance
 11.110456 |         9179136 |       477466624 |         7891707 |         107 - Top Storage Consumers
Look at the total rss
I was profiling "all departments..." which shows the most improvement
Joe Rafaniello
@jrafanie
Nov 17 2015 21:53
29 million object allocations in "Folder to VMs Relationships" 😱
Jason Frey
@Fryguy
Nov 17 2015 21:54
yeha that sounds about right
Joe Rafaniello
@jrafanie
Nov 17 2015 21:54
ETOOMANYQUERIES
Jason Frey
@Fryguy
Nov 17 2015 21:54
:trollface:
@jrafanie After we release cap rc2, let's focus on that rewrite that we designed
Joe Rafaniello
@jrafanie
Nov 17 2015 21:55
lol, yes
Jason Frey
@Fryguy
Nov 17 2015 21:55
ETOOMANYTHINGSTODO
Joe Rafaniello
@jrafanie
Nov 17 2015 21:55
I knock out 486192 allocations, it's only < 4% of the 13.4 million in the all dept report 😢
Keenan Brock
@kbrock
Nov 17 2015 22:02
@Fryguy is there a reason we are doing all these queries to just delete a bunch of records.
I'm assuming it is transaction log centric? Is it ok if we purge in a slightly different order rather than ordered by id?
purging.rb
Jason Frey
@Fryguy
Nov 17 2015 22:02
yes, discussed that with @dmetzger57 some time back and we considered removing the sort by id
Keenan Brock
@kbrock
Nov 17 2015 22:02
is there a reason we are bringing the ids back?
Jason Frey
@Fryguy
Nov 17 2015 22:03
part of the problem is if there is code that blindly gets the metrics without regard to purge windows, then you could end up with holes if things aren't purged in order
so a chart would have a weird graph in the "pre-purge" time window
Oleg Barenboim
@chessbyte
Nov 17 2015 22:03
@kbrock I think @Fryguy is focusing on getting us to a place where we can start CFME build - do not disturb, unless critical
Jason Frey
@Fryguy
Nov 17 2015 22:04
removing the id would fix the purge spiral, btw
Keenan Brock
@kbrock
Nov 17 2015 22:04
ok, I'll throw together a spiral - thanks.
happy hacking