These are chat archives for ManageIQ/manageiq/performance

6th
Feb 2016
Jason Frey
@Fryguy
Feb 06 2016 01:01
So, I rewrote ancestry gem's arranged_nodes method
Keenan Brock
@kbrock
Feb 06 2016 01:01
k
I changed the gem to use ranges instead of like - much faster
Jason Frey
@Fryguy
Feb 06 2016 01:01
In Rails console loading the biggest EMS took 10 seconds
calling subtree_arranged
new code is 5 seconds
50% improvement
Keenan Brock
@kbrock
Feb 06 2016 01:02
is that in the gem, or our code?
Jason Frey
@Fryguy
Feb 06 2016 01:02
gem
have to test the end to end with the UI
Keenan Brock
@kbrock
Feb 06 2016 01:02
you put in a PR?
cool
let me know
Jason Frey
@Fryguy
Feb 06 2016 01:02
not yet...I just finished it locally
and made sure all the ancestry tests passed
Keenan Brock
@kbrock
Feb 06 2016 01:02
I'll have to get another one together
Jason Frey
@Fryguy
Feb 06 2016 01:03
how can you do ranges on a string column?
the thing about the LIKE that is nice is how it uses the index
Keenan Brock
@kbrock
Feb 06 2016 01:06
can I require a newer version of arel
is arel pretty tightly linked to AR?
ranges use indexes better than likes
Jason Frey
@Fryguy
Feb 06 2016 01:07
Ruby ranges?
Keenan Brock
@kbrock
Feb 06 2016 01:07
hmm. I did gt and lt, but ranges probably work great - thanks
will see how that works
Jason Frey
@Fryguy
Feb 06 2016 01:07
I'm so confused what you are trying to solve
Keenan Brock
@kbrock
Feb 06 2016 01:07
if you have a like
then you need a bunch of clauses
and the "OR" is not good
a range doesn't use the OR
Jason Frey
@Fryguy
Feb 06 2016 01:08
are you talking about ancestry gem, or something completely unrelated?
Keenan Brock
@kbrock
Feb 06 2016 01:08
ancestry gem
Jason Frey
@Fryguy
Feb 06 2016 01:08
ok
Keenan Brock
@kbrock
Feb 06 2016 01:08
maybe the like is fine, but I think string ranges may work better
maybe not
Jason Frey
@Fryguy
Feb 06 2016 01:08
the ancestry column is of the form "123/456/789"
Keenan Brock
@kbrock
Feb 06 2016 01:08
yes
Jason Frey
@Fryguy
Feb 06 2016 01:08
how can you turn that into a range?
parse it in SQL?
Keenan Brock
@kbrock
Feb 06 2016 01:09
"123:".."123z"
which becomes a BETWEEN
Jason Frey
@Fryguy
Feb 06 2016 01:10
o_O I just don't get it :sweat_smile:
Keenan Brock
@kbrock
Feb 06 2016 01:10
This message was deleted
Jason Frey
@Fryguy
Feb 06 2016 01:10
yes
Keenan Brock
@kbrock
Feb 06 2016 01:10
oops
huh
ok
Jason Frey
@Fryguy
Feb 06 2016 01:10
I get what a range is...I don't get how you can apply that to 3 disparate numbers separated by slashes
Keenan Brock
@kbrock
Feb 06 2016 01:11
ascii chart: '/0123456789:'
ooh
Jason Frey
@Fryguy
Feb 06 2016 01:11
right
Keenan Brock
@kbrock
Feb 06 2016 01:11
gotcha
if you do a range on the bottom one
just left to right
all children are of the form parent/[0-9]
Jason Frey
@Fryguy
Feb 06 2016 01:12
right, unless there's 1 or 0 (parents, that is)
Keenan Brock
@kbrock
Feb 06 2016 01:12
so you want to look for parent OR parent/0..parent/9
Jason Frey
@Fryguy
Feb 06 2016 01:12
oh i see
so you are only ranging on the last number
Keenan Brock
@kbrock
Feb 06 2016 01:12
so you want to look for 'parent[/$] but not [0-9]'
yea
because 'parent[0-9]' isn't really parent
design wise
having no entry for the parent itself - that was a mistake
so you have to do id = parent OR ancestry = "parent/%"
Jason Frey
@Fryguy
Feb 06 2016 01:14
you mean the root?
Keenan Brock
@kbrock
Feb 06 2016 01:14
yea
Jason Frey
@Fryguy
Feb 06 2016 01:14
well, the root has no parent, so...
Keenan Brock
@kbrock
Feb 06 2016 01:14
so if ancestry = "me/children" instead of "children"
Jason Frey
@Fryguy
Feb 06 2016 01:14
yeah
Keenan Brock
@kbrock
Feb 06 2016 01:14
you could find it all in 1 field
and that is a single hit, single field index
Jason Frey
@Fryguy
Feb 06 2016 01:15
yeah, though, I think there are some methods you can't handle with that approach
Keenan Brock
@kbrock
Feb 06 2016 01:15
rather than using an OR
probably
heh - "what are the roots"
that is tricky
Jason Frey
@Fryguy
Feb 06 2016 01:15
right
what kinds of gains are you seeing?
indexed with LIKE vs ranges
do the ranges handle non-integer primary keys?
because the current format does
like GUIDs
Keenan Brock
@kbrock
Feb 06 2016 01:16
this branch is from dec 2
Jason Frey
@Fryguy
Feb 06 2016 01:16
cool
Keenan Brock
@kbrock
Feb 06 2016 01:16
there have been a bunch of PRs aroudn guids
I got someone to put in a PR to allow you to change the matching algorithm
currently it is hardcoded in the gem the pattern for a "valid id"
useless
well
it doesn't really do anything
but good to prevent errors I guess
THAT puts a blocker in there
I wanted to fix the sorting
but was getting strange errors
ok
so most people are excited about ltrees
"1.2.3" vs "1/2/3"
Jason Frey
@Fryguy
Feb 06 2016 01:19
the ones built into PG?
Keenan Brock
@kbrock
Feb 06 2016 01:19
it is an index built into postgres
yea
Jason Frey
@Fryguy
Feb 06 2016 01:19
yeah
Keenan Brock
@kbrock
Feb 06 2016 01:19
I haven't had much chance
seems to have the most promise
I wanted to use arrays, but people seem scared of those
Keenan Brock
@kbrock
Feb 06 2016 01:25
@Fryguy thanks for reminding me I had a few things I wanted to push to ancestry
Jason Frey
@Fryguy
Feb 06 2016 01:33
putting kids to bed...i'll have a PR later
Keenan Brock
@kbrock
Feb 06 2016 01:33
nice
Jason Frey
@Fryguy
Feb 06 2016 13:29
I made the PR last night but forgot to post it: stefankroes/ancestry#261
It's marked as WIP because I'm not sure the test coverage is sufficient for degenerate cases, and non-rooted-subtree cases
Matthew Draper
@matthewd
Feb 06 2016 13:32
Talking of like queries: closure_tree all the things.. no string manipulation is better than no string manipulation
(no idea what its performance characteristics are in this particular case, of course)
Keenan Brock
@kbrock
Feb 06 2016 18:32
could you verify stefankroes/ancestry#260 -- looks like a no brainer, just want another eye
@Fryguy I'm good to go with that one. let me know when you are all set
@matthewd in ancesty, I want to use a more recent arel (to get case and coalesce) but don't know if that is valid.
arel version is dictaged by rails - right?
Matthew Draper
@matthewd
Feb 06 2016 18:36
Correct
Keenan Brock
@kbrock
Feb 06 2016 18:36
what does closure_tree mean?
Matthew Draper
@matthewd
Feb 06 2016 18:36
Best bet is probably to feature-detect the methods you want
Keenan Brock
@kbrock
Feb 06 2016 18:36
shouldn't you be asleep?
Matthew Draper
@matthewd
Feb 06 2016 18:36
It's a gem
Keenan Brock
@kbrock
Feb 06 2016 18:36
k
Matthew Draper
@matthewd
Feb 06 2016 18:36
No more than on any other day :P
Keenan Brock
@kbrock
Feb 06 2016 18:37
:)
is ancestry instead of closure_tree?
or are they complimentary?
(um - vice versa)
Matthew Draper
@matthewd
Feb 06 2016 18:37
Yes, they solve the same problem in different ways (using different data structures)
Keenan Brock
@kbrock
Feb 06 2016 18:38
we use ancestry, acts_as_tree, and acts_as_list, lets add a fourth?
Matthew Draper
@matthewd
Feb 06 2016 18:40
It'd be instead of ancestry… and at least some parts of our Relationship model
Not a small / short-term undertaking
Keenan Brock
@kbrock
Feb 06 2016 18:40
how is it different... reding
ancestry_path looks like ancestry
Matthew Draper
@matthewd
Feb 06 2016 18:41
Yeah, but it's derived, here
Keenan Brock
@kbrock
Feb 06 2016 18:41
aah
how do you know who all your children are?
Matthew Draper
@matthewd
Feb 06 2016 18:42
.children?
Or do you mean how does it work internally?
Keenan Brock
@kbrock
Feb 06 2016 18:42
conceptually
yea
internally

lol

.children?
um... did I miss something?
@matthewd

Matthew Draper
@matthewd
Feb 06 2016 18:43
It uses a table that consists of [ancestor_id, descendant_id, depth]
Keenan Brock
@kbrock
Feb 06 2016 18:43
which is kinda like our ancestry table
Matthew Draper
@matthewd
Feb 06 2016 18:43
So children is ancestor_id = self.id and depth = 1
Keenan Brock
@kbrock
Feb 06 2016 18:43
right
nice
Matthew Draper
@matthewd
Feb 06 2016 18:44
Children and grandchildren is ancestor_id = self.id and depth in (1,2)
Keenan Brock
@kbrock
Feb 06 2016 18:45
nice - ref
does this look right?
a -> b -> c
has records
a b 1
a c 2
b c 1
Matthew Draper
@matthewd
Feb 06 2016 18:46
The query to preload the tree is: ancestor_id in (descendant_id in (open node ids)) and depth in (0,1)
Yes, that's right
Keenan Brock
@kbrock
Feb 06 2016 18:46

what do you think about using arrays instead?

a []
b [a]
c [a,b]

you know, I always felt it should be:
a [a]
b [a,b]
c [a,b,c]
Matthew Draper
@matthewd
Feb 06 2016 18:47
^^ "children and self of all ancestors and selfs of open nodes"
Keenan Brock
@kbrock
Feb 06 2016 18:48
have you looked into ltree?
Matthew Draper
@matthewd
Feb 06 2016 18:48
I'm pretty sure the closure_tree docs explain why it's more efficient than ancestry style
Keenan Brock
@kbrock
Feb 06 2016 18:48
thanks
tangent:
what do you think about changing our string collation to be case insensitive, so we can get rid of all our lower(name)?
Matthew Draper
@matthewd
Feb 06 2016 18:51
Potentially worthwhile.. though we can presumably get the same performance gains by indexing lower(name), if we're not already
just feel like we do that lower(name) (inconsistently) all over the place
does ar have easy indexing functions yet? remember a pr...
Matthew Draper
@matthewd
Feb 06 2016 18:54
Nope. I think there's one in flight, actually.
Keenan Brock
@kbrock
Feb 06 2016 18:54
yea . think it was tangental, and I chimed in on it to try and get it to also work.
Matthew Draper
@matthewd
Feb 06 2016 18:54
Really gets into "complex enough that you should just write the SQL you really mean", IMO
Keenan Brock
@kbrock
Feb 06 2016 18:54
kinda like a group by that allows arbitrary sql
create_index :ancestry, "lower(name)", :name => :name
I dunno
I like that

I like this:

closure_tables
A node even connects to itself
slide 43

Matthew Draper
@matthewd
Feb 06 2016 18:59
To me, it's the most complex mechanism to implement — even moreso than nested sets — but given someone else has done the hard work, it has the best performance characteristics, and is particularly conducive to AR relationization
Keenan Brock
@kbrock
Feb 06 2016 19:01
ok, so I have a different thought
path enumeration using arrays
hmm. or using ltrees for path enumeration
query child = easy (or mid), and referential integrity = present
still not sure if you should have your own id in the list.
closure trees seems to say yes, and while ancestry said no, nothing preventing it from doing it (will make an extra query on insert)
you're a jerk. I totally want to create a project with the 4 (or more) implementations of this :(
/me doesn't have enough projects
Matthew Draper
@matthewd
Feb 06 2016 19:05
:wave: just use closure_tree :wave:
Keenan Brock
@kbrock
Feb 06 2016 19:05
slide 69
closure table is easy across the board
I think that is because they have their own id in there - not because of the data structure per say