These are chat archives for ManageIQ/manageiq/performance

21st
Feb 2018
Ladislav Smola
@Ladas
Feb 21 2018 08:58
@kbrock nice
Keenan Brock
@kbrock
Feb 21 2018 08:59
ugh - is it morning already?
Ladislav Smola
@Ladas
Feb 21 2018 08:59
@kbrock were you coding through the night? :-)
Keenan Brock
@kbrock
Feb 21 2018 09:00
found a stupid programming game... TIS-100 (all about optimization)
ugh - I'm going to bed
but I'm so close... ;)
Jason Frey
@Fryguy
Feb 21 2018 15:31
@cben @kbrock @NickLaMuro I put together a PR documenting how and why metrics works: ManageIQ/manageiq#17034
In reviewing it though, and trying to understan why @kbrock EXPLAINs look the way they do, it looks like we never implemented constraint exclusion (https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION)
I could have SWORN we did that from day one, because that is where a lot of the performance benefits are
either that, or it's there and being ignored for some reason
Keenan Brock
@kbrock
Feb 21 2018 15:35
@Fryguy how do you "enable constraint exclusions"?
Jason Frey
@Fryguy
Feb 21 2018 15:35
did you click the link?
Keenan Brock
@kbrock
Feb 21 2018 15:35
it is a big page
I see "set constraint on"
but is it that simple?
or is it an alter table or something?
Jason Frey
@Fryguy
Feb 21 2018 15:35
No, I think that's about it
but we have to have constraints on the tables too
my guess is there are no constraints on the tables, so the constraint analyzer can't do anything
Keenan Brock
@kbrock
Feb 21 2018 15:37
  ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;
yea, there is something different that we need to do
Jason Frey
@Fryguy
Feb 21 2018 15:38
but yeah, the whole point is that if the constraint analyzer can completely eliminate tables up front, it will...so you shouldn't see 24 tables in the EXPLAIN, instead only seeing the 1 that fits the where clause
Keenan Brock
@kbrock
Feb 21 2018 15:40
@Fryguy do you think the change to the metrics_xx sequence screwed things up?
Jason Frey
@Fryguy
Feb 21 2018 15:41
no, I think we need to add check constraints to the timestamp column
Keenan Brock
@kbrock
Feb 21 2018 15:41
so we have a where clause that doesn't have a constraint - so it does that ugly stuf
how the heck did this ever work?
Jason Frey
@Fryguy
Feb 21 2018 15:42
That's why I'm so surprised
I distinctly remember ths working at the beginning...maybe upgrading PG changed something?
Keenan Brock
@kbrock
Feb 21 2018 15:43
    (0..23).each do |n|
      s = subtable_name("metrics", n)
      create_metrics_table  s
      add_metrics_indexes   s
      add_table_inheritance s, "metrics", :conditions => ["capture_interval_name = ? AND EXTRACT(HOUR FROM timestamp) = ?", "realtime", n]
    end
well, it started botching up recently
maybe pg version
hmm
maybe if we specify the capture_interval_name
sweet that will be easy to try
Jason Frey
@Fryguy
Feb 21 2018 15:44
yeah, it's should be easy to add all the check constraints manually, then redo the EXPLAIN and EXPLAIN ANALYZE
Keenan Brock
@kbrock
Feb 21 2018 15:44
I think no schema changes
will change my batch query to include capture_interval_name
Jason Frey
@Fryguy
Feb 21 2018 15:44
ohhh I see what you mean
Keenan Brock
@kbrock
Feb 21 2018 15:45
wow
this could be big
Jason Frey
@Fryguy
Feb 21 2018 15:45
O_O
Keenan Brock
@kbrock
Feb 21 2018 15:45
I mean, I really like the truncate
the truncate is freaking sweet
but
for those risk adverse...
Jason Frey
@Fryguy
Feb 21 2018 15:45
yeah
Keenan Brock
@kbrock
Feb 21 2018 15:45
(I'm talking about you @NickLaMuro ;) )
I actually thought I was in twilight zone there, cause I KNEW I had added that haha
Keenan Brock
@kbrock
Feb 21 2018 15:47
heh. yea
Nick LaMuro
@NickLaMuro
Feb 21 2018 15:48

(I'm talking about you @NickLaMuro ;) )

Is still playing catchup

Jason Frey
@Fryguy
Feb 21 2018 15:49

I see "set constraint on"
but is it that simple?

Read into that setting a little and it's one of those three-way values... on/off/partition and partition is the default

partition means that it will only apply constraint exclusion on tables defined as partitioned (which ours is)
so yeah, I think adding capture_interval_name is really all that needs to happen
Keenan Brock
@kbrock
Feb 21 2018 15:51
I tried set exclusion_constraint = on
and I tried adding the field to the delete clause - no dice
something changed in the query optimizer
Jason Frey
@Fryguy
Feb 21 2018 15:52
hmmmm
I think this is definitely where the issue lies though...worth investigating
Keenan Brock
@kbrock
Feb 21 2018 15:54
neat:
explain analyze
DELETE FROM "metrics"
WHERE "metrics"."ctid" IN (
   SELECT  "metrics_00"."ctid"
   FROM "metrics_00"
   WHERE ("metrics_00"."timestamp" <= '2018-02-01 19:56:37.519979')
   and capture_interval_name = 'realtime' LIMIT 100
) and capture_interval_name = 'realtime';
notice that the subquery does NOT check all the tables
this is against an empty table
aah - I see mistake, trying again
Keenan Brock
@kbrock
Feb 21 2018 16:11

ok, this is quick:

explain analyze
DELETE FROM "metrics_00"
WHERE "id" IN (
   SELECT  "metrics"."id"
   FROM "metrics"
   WHERE ("metrics"."timestamp" <= '2018-02-01 19:56:37.519979')
   and capture_interval_name = 'realtime' 
   and date_part('hour'::text, "timestamp") = '0'
   LIMIT 100
);

Had to specify capture interval AND date_part (so it hit the constraint.
Note how it has select from metrics - it only selects from 1 table - so the constraint took effect

Jason Frey
@Fryguy
Feb 21 2018 16:21
cool!
Keenan Brock
@kbrock
Feb 21 2018 16:21
but that is still much slower than just hitting metrics_00 table
Keenan Brock
@kbrock
Feb 21 2018 16:46
@NickLaMuro you have time 3pm est today to pair?
Nick LaMuro
@NickLaMuro
Feb 21 2018 16:52
sure
Keenan Brock
@kbrock
Feb 21 2018 16:58
Thinking a truncate script that is run manually may be best here. Running out of time to get this to customer
tools/purge_metrics.rb enhancement or something
Daniel Berger
@djberg96
Feb 21 2018 17:04
for a quick and dirty, seems ok to me
rails job maybe?
Keenan Brock
@kbrock
Feb 21 2018 18:39
@gtanzillo would you like me to look into quick change to purge_metrics.rb, or do you think we'll have a chance with ManageIQ/manageiq#17017 ?
Gregg Tanzillo
@gtanzillo
Feb 21 2018 18:40
Yes, LJ and I are looking at it now. I’ll start a meeting
Nick LaMuro
@NickLaMuro
Feb 21 2018 19:15

2cents: @kbrock The more and more I think about it, I think that would be the best route to go as well, possibly make it a opt-in toggle in purge_metrics.rb for those that are still using it don't get blind-sided by the change.

That, or you could consider making the changes to 17017 an opt-in feature toggle/experimental feature. Basically, I feel that we haven't done our due-diligence with testing the truncation strategy, even though I think it definitely makes sense going forward. But with this being about deleting data, I tend to lean towards being as skeptical about changes as possible.

Keenan Brock
@kbrock
Feb 21 2018 19:56
I'll comment in ManageIQ/ManageIQ#17017 the findings from our call w/ @jrafanie and @gtanzillo TL;DR: be less clever and explicity generate the truncate statements
Ron Valente
@rvalente
Feb 21 2018 21:41
Regarding the metrics issues, I have done some tuning in our environment of PostgreSQL that greatly improved the operation and performance of VACUUM and ANALYZE. I have created a PR here: ManageIQ/manageiq-appliance#181