These are chat archives for ManageIQ/manageiq/performance

12th
Aug 2015
Alex Krzos
@akrzos
Aug 12 2015 18:19
@chessbyte I can absolutely post results to performance/scalability findings here.
Keenan Brock
@kbrock
Aug 12 2015 18:20
thanks @akrozos
Alex Krzos
@akrzos
Aug 12 2015 18:40
vmware-large-refresh-without broker.png
vmware-large-refresh-with broker.png
From my recent testing of a Miq appliance, initial refreshes of a "large" vmware environment show some improvements moving forward to Ruby 2.2 However we can see significant time is still spent during db_save_inventory with and without a vim broker.
Dennis Metzger
@dmetzger57
Aug 12 2015 18:43
@akrzos what appliance version were the tests run on?
Alex Krzos
@akrzos
Aug 12 2015 18:44
1.9.3 = CFME 5.3, 2.0.0 = CFME 5.4, 2.1.6 = Miq Master (Before upgrade to Ruby 2.2), 2.2.2 = Miq Master as of 4-5 days ago
@dmetzger57 ^
Dennis Metzger
@dmetzger57
Aug 12 2015 18:45
thanks @akrzos
Alex Krzos
@akrzos
Aug 12 2015 18:45
During the db_save_inventory time, the majority of the time is spent between two log lines with the Refresh worker pegged at 100% cpu usage
this was highly evident when I built the scale environment
the scale environment contains providers with 10,000 virtual machines each
I have this documented as an open bugzilla here: https://bugzilla.redhat.com/show_bug.cgi?id=1243938
Dennis Metzger
@dmetzger57
Aug 12 2015 18:48
I saw that ticket yesterday, I'll be speaking with @Fryguy about his ideas on this at some point in the near future
Alex Krzos
@akrzos
Aug 12 2015 18:52
@dmetzger57 understood, I'm concerned there is just some logic that surfaces as very time consuming to be performed in ruby, especially when the scale of environments are pushed higher and higher
in a medium environment (1,000 virtual machines), the time spent between those log lines is on the order of 1 minute thus scaling 10x the number of vms scales whatever portion of work there 90x
Alex Krzos
@akrzos
Aug 12 2015 18:59
I have data on other various features and areas of the appliance and some gaps (Feedback on how to test/analyze is always welcome), but I think Refresh is a reasonable place to start addressing issues at scale
Dennis Metzger
@dmetzger57
Aug 12 2015 19:06
that does indeed sound like a reasonable place to start addressing issues of scale.
I’ve made no attempt to run that at all, let alone measure a resulting performance difference… but I was curious what lay between those two log entries, and that was the first thing that jumped out at me
Jason Frey
@Fryguy
Aug 12 2015 19:23
ooohhh Iike the bulk connect idea
Matthew Draper
@matthewd
Aug 12 2015 19:23
Specifically, the thing that sounds most likely to cause an exponential slow-down is the fact we clear the relationship cache after adding every one of the 10000 VMs
.. yet the underlying ‘add’ method already expects to take a splat
Jason Frey
@Fryguy
Aug 12 2015 19:26
@matthewd Where do we clear the relationship_cache?
Or does that just happen because we call something like add_vm, and under the covers it clears the cache?
Matthew Draper
@matthewd
Aug 12 2015 19:27
Yeah, the latter
Jason Frey
@Fryguy
Aug 12 2015 19:27
ahhh
Matthew Draper
@matthewd
Aug 12 2015 19:27
So, I don’t know how expensive that actually is
Jason Frey
@Fryguy
Aug 12 2015 19:27
It might not be bad if it doesn't go back and read it again
in the loop
Matthew Draper
@matthewd
Aug 12 2015 19:27
But if you’re looking for a non-linear thing, it’s the thing that grabs my attention
Won’t self.child_ids do just that, though?
Keenan Brock
@kbrock
Aug 12 2015 20:03
I really don't like all our relationship(true) that clear those caches
but that is an old conversation