These are chat archives for ManageIQ/manageiq/performance

12th
Jan 2016
Keenan Brock
@kbrock
Jan 12 2016 05:04
@matthewd go to bed
(or are you taking advantage that travis is pretty wide open)
Matthew Draper
@matthewd
Jan 12 2016 05:04
haha
both? :)
Keenan Brock
@kbrock
Jan 12 2016 05:04
:)
Joe Rafaniello
@jrafanie
Jan 12 2016 13:53
I'm not sure when @matthewd or @kbrock sleep
Greg Blomquist
@blomquisg
Jan 12 2016 13:55
I'm sure the only answer is "never"
Alex Krzos
@akrzos
Jan 12 2016 13:56
me neither, @kbrock last night I attempted a perf_capture_timer on the 5.5 coordinator appliance and it took an atrocious >5k seconds
Jason Frey
@Fryguy
Jan 12 2016 13:59
@akrzos is this a new problem or the same problem we've had all along?
Alex Krzos
@akrzos
Jan 12 2016 14:03
@Fryguy The problem was bad in 5.3, 5.4 it seemed better, 5.5 seems as bad if not worse than 5.3, iirc
Jason Frey
@Fryguy
Jan 12 2016 14:04
Ugh OK
Alex Krzos
@akrzos
Jan 12 2016 14:06
also my evm.log doesn't appear to be rotating
could have been the power outage + moving the vms
Dennis Metzger
@dmetzger57
Jan 12 2016 14:08
@akrzos can you send me the IP of the appilance with the log rotate issue, i’d like to take a quick peek
Oleg Barenboim
@chessbyte
Jan 12 2016 14:18
@gtanzillo wasnt something recently fixed for log rotation of large files?
Gregg Tanzillo
@gtanzillo
Jan 12 2016 14:21
@chessbyte Yes, Nick made a fix. Let mefind the Pr
ManageIQ/manageiq-appliance#50
Keenan Brock
@kbrock
Jan 12 2016 14:24
ugh - sorry I'm late. battery died in car right as we were driving out to the bus... AAA/drop off kid :(
Daniel Berger
@djberg96
Jan 12 2016 14:24
Regarding miq_tasks, doesn't AR's count method do count(id)? Which would automatically be indexed by PG since it's the primary key?
Keenan Brock
@kbrock
Jan 12 2016 14:25
@djberg96 - I like count(*) better than count(id)
since id is not null, it probably doesn't make a difference but...
old habbits die young
if you do count(col) and that col is nullable, then it has to hit an index / scan the table. vs count(*) can sometimes just hit the stats / primary index
Matthew Draper
@matthewd
Jan 12 2016 14:27
It uses whatever index is appropriate (e.g. from the where clause) to help it, but it ultimately has to read the actual rows out of the table no matter what
Keenan Brock
@kbrock
Jan 12 2016 14:28
@matthewd is that a postgres thing? I remember mysql and sybase were smarter than that
Daniel Berger
@djberg96
Jan 12 2016 14:28
looks like the intertubes back you up kbrock
Matthew Draper
@matthewd
Jan 12 2016 14:28
Yep
Specifically, because of MVCC, the index doesn't know whether the row is dead (and should therefore not be counted)
Keenan Brock
@kbrock
Jan 12 2016 14:29
yea - that is confusing for so many people
Matthew Draper
@matthewd
Jan 12 2016 14:29
If you employ MySQL's traditional "just lock the table lol" strategy, it does make maintaining some counts easier
Keenan Brock
@kbrock
Jan 12 2016 14:29
they assume select count(*) from X where Y is quicker than select * from X where Y limit 10
Daniel Berger
@djberg96
Jan 12 2016 14:30
explain select count(*) from miq_tasks vs explain select count(id) from miq_tasks should confirm it
(or, at worst, the cost should be the same)
Keenan Brock
@kbrock
Jan 12 2016 14:31
@djberg96 do you know much about the size / frequency of change in MiqTask? I noticed that we have a few queries that search on identifier but there is no index on it
Matthew Draper
@matthewd
Jan 12 2016 14:31
It technically is, but only to the extent there's less data moving around: the same disk sectors need to be inspected
Daniel Berger
@djberg96
Jan 12 2016 14:32
@kbrock I know nothing about it specifically, no. I was just interested in the SQL bits, sorry.
I don't know if/how Rails or PG does query caching.
Dennis Metzger
@dmetzger57
Jan 12 2016 14:37
@gtanzillo @chessbyte it’s not the free space check issue causing the issue in this case
Gregg Tanzillo
@gtanzillo
Jan 12 2016 14:39
@dmetzger57 Do you or @akrzos have the details and an appliance so that we can look into it?
Dennis Metzger
@dmetzger57
Jan 12 2016 14:40
@gtanzillo @akrzos I’ll send you the IP (Alex gave it to me) privately, I’m on the appliance at the moment
Gregg Tanzillo
@gtanzillo
Jan 12 2016 14:41
Thanks @dmetzger57
Alex Krzos
@akrzos
Jan 12 2016 14:44
@gtanzillo Have a go at it, hopefully it wasn't just a result of the appliance having a hard life being shutdown, migrated and having a power outage
Gregg Tanzillo
@gtanzillo
Jan 12 2016 14:46
Will do @akrzos
Keenan Brock
@kbrock
Jan 12 2016 15:07
@jrafanie curiously we were not caching my_region ref - do you want me to change MiqEnterprise.my_enterprise? I'm feeling like @region ||= is not good in a class method
Joe Rafaniello
@jrafanie
Jan 12 2016 15:08
do we have a need to cache my_region?
For example, what is perf_capture_timer's query count before and after my_region caching?
we don't add caching until we need it generally
Keenan Brock
@kbrock
Jan 12 2016 15:09
it got rid of 200 queries
Joe Rafaniello
@jrafanie
Jan 12 2016 15:09
out of?
Keenan Brock
@kbrock
Jan 12 2016 15:09
looking
Joe Rafaniello
@jrafanie
Jan 12 2016 15:10
keep in mind, all other callers have been expecting an uncached value so it has to be worth it to make such a global change
Keenan Brock
@kbrock
Jan 12 2016 15:10
"my_region"
that is based upon a static region number
hardcoded in a file
Joe Rafaniello
@jrafanie
Jan 12 2016 15:11
yes, until it changes
Keenan Brock
@kbrock
Jan 12 2016 15:11
we don't re-read the file
do we?
ok
I'll revisit
Joe Rafaniello
@jrafanie
Jan 12 2016 15:11
you'd have to look at where we change it
Keenan Brock
@kbrock
Jan 12 2016 15:12
we change REGION?
Joe Rafaniello
@jrafanie
Jan 12 2016 15:12
we do that in appliance console I believe
we also sync it down when we connect to a remote database
Keenan Brock
@kbrock
Jan 12 2016 15:12
if you change the region, you have to point to a different database
Joe Rafaniello
@jrafanie
Jan 12 2016 15:12
so as long as those places clear the cache, it's fine
Keenan Brock
@kbrock
Jan 12 2016 15:12
wait
you are telling me we change the database in the middle of running a worker?
Joe Rafaniello
@jrafanie
Jan 12 2016 15:13
but even still measure the side effect to determine if it's even worth it
Keenan Brock
@kbrock
Jan 12 2016 15:13
you can't change the region
that sets the sequence number at rails startup
Joe Rafaniello
@jrafanie
Jan 12 2016 15:13
no, that's not what I'm saying
Keenan Brock
@kbrock
Jan 12 2016 15:13
all hell breaks loose
Joe Rafaniello
@jrafanie
Jan 12 2016 15:13
appliance console allows you to change the region when configuring the database
configuring the database isn't allowed if others are connected to the db
Keenan Brock
@kbrock
Jan 12 2016 15:14
in a rails app, I thought the region number was static
is this not right?
Jason Frey
@Fryguy
Jan 12 2016 15:14
yes, it's static
Joe Rafaniello
@jrafanie
Jan 12 2016 15:14
yes, except where I said
so, it was 200 queries saved out of ???
Keenan Brock
@kbrock
Jan 12 2016 15:15
ok, I'll research caching my_region - thought it was the same as my_server
Joe Rafaniello
@jrafanie
Jan 12 2016 15:15
200 out of 1000 is worth it
Keenan Brock
@kbrock
Jan 12 2016 15:15
700
well
hard to quantify
Joe Rafaniello
@jrafanie
Jan 12 2016 15:15
200 out of 10,000, not so much
Keenan Brock
@kbrock
Jan 12 2016 15:15
queue captures is the majority of work
the 233 out of 1023 (excluding queue_caputres)
the queue captures takes too long - so I have to piece meal this
Joe Rafaniello
@jrafanie
Jan 12 2016 15:16
ok, so then it's probably worth it
Keenan Brock
@kbrock
Jan 12 2016 15:17
it is the query cache in region that is the real savings
we are looking up the same ems / configuration because it keeps looking it up
because the region is a new record
ugh - too many numbers
too many queries - I may be conflating
Joe Rafaniello
@jrafanie
Jan 12 2016 15:18
Please do specific tests, with described environments (x host, x vms, etc.), y method before (1023, 233 after)
Keenan Brock
@kbrock
Jan 12 2016 15:18
+1
doing now
Joe Rafaniello
@jrafanie
Jan 12 2016 15:18
eliminate parts of the method if you need to... just make sure you're comparing the same thing with ONLY your change
Keenan Brock
@kbrock
Jan 12 2016 15:19
perf_capture_timer takes > 6 minutes to run
Joe Rafaniello
@jrafanie
Jan 12 2016 15:19
But yes, we started caching all the things at one point and that was causing us subtle bugs so we've been re-adding caching only when it's really worth it
Dennis Metzger
@dmetzger57
Jan 12 2016 15:19
@akrzos your appliance has a nicely sized evm log now. Combination of Nick’s fix and forcing the run even though it though it happily processed the file early this morning
Keenan Brock
@kbrock
Jan 12 2016 15:20
I was reacting to MiqEnterprise.my_enterprise cache
Joe Rafaniello
@jrafanie
Jan 12 2016 15:20
"really worth it" requires some documentation so we know that this cache is truly needed
Keenan Brock
@kbrock
Jan 12 2016 15:21
@akrzos the database I got off of you is great - shows problems that a clean database does not. do you have one with only 100vms though? but has lots of captures and other stuff - so these issues will still show? having to flub too many numbers because I can't run before/after on 6-10 minute runs
Keenan Brock
@kbrock
Jan 12 2016 15:27
@jrafanie for https://github.com/ManageIQ/manageiq/pull/6141/commits could you tell me which ones you want better numbers?
Alex Krzos
@akrzos
Jan 12 2016 15:27
@kbrock I don't have an "accumulated" run on my 100vm provider, but I can spin up a vm to start building a db for that
Keenan Brock
@kbrock
Jan 12 2016 15:27
I'm going to run reusing the zone again - think it has better savings now
@akrzos is 100 as small as you can get?
Alex Krzos
@akrzos
Jan 12 2016 15:28
@kbrock Did you have in mind about how much time we really need burned into that db?
Keenan Brock
@kbrock
Jan 12 2016 15:28
2 hours?
Alex Krzos
@akrzos
Jan 12 2016 15:28
@kbrock I can make a smaller one
Keenan Brock
@kbrock
Jan 12 2016 15:28
nah - you have 100. good enough
Jason Frey
@Fryguy
Jan 12 2016 15:28
@kbrock for those 6 minutes, do you know where the majority of time is being spent?
Keenan Brock
@kbrock
Jan 12 2016 15:28
queuing
Jason Frey
@Fryguy
Jan 12 2016 15:29
so adding things to the queue?
Keenan Brock
@kbrock
Jan 12 2016 15:29
@Fryguy putting into the queue. but a lot of stuff outside effects what goes into the queue
yes
stuff like looking up my zone and stuff
Joe Rafaniello
@jrafanie
Jan 12 2016 15:29
@kbrock I would say any caching or significant changes would warrant some benchmarks (before/after) numbers
Jason Frey
@Fryguy
Jan 12 2016 15:29
I guess, but no matter what changes you make, the same items will need to end up on the queue
Joe Rafaniello
@jrafanie
Jan 12 2016 15:29
collect + flatten -> flat_map, no
Keenan Brock
@kbrock
Jan 12 2016 15:29
@jrafanie I run benchmarks
Jason Frey
@Fryguy
Jan 12 2016 15:29
unless you end up wanting to completely change the code
Keenan Brock
@kbrock
Jan 12 2016 15:29
don't want to change code
@jrafanie but there are so many variables - so many changes
and when you introduce a change, you need to go back and run them all over again
Joe Rafaniello
@jrafanie
Jan 12 2016 15:30
@kbrock commits like this, I don't see the "WHY", I have to read the code and figure out the why: kbrock/manageiq@73703f5
Keenan Brock
@kbrock
Jan 12 2016 15:30
you know this
thnx
@jrafanie it split up a method into 3 smaller methods.
what is the best way to explain that?
Joe Rafaniello
@jrafanie
Jan 12 2016 15:31
why did you split it into 3 small methods
you have good reasons, just it's not in the commit message
Alex Krzos
@akrzos
Jan 12 2016 15:31
@dmetzger57 So was that another issue with the log rotate or just an effect of the harsh life the appliance has had? I have a few others that have >1G non-rotated log files I'd like to clean up so if you guys could share how to get log rotate running I can clean those up
Chris Arcand
@chrisarcand
Jan 12 2016 15:31
Big, nicely formatted commit messages :heart_eyes:
Keenan Brock
@kbrock
Jan 12 2016 15:32
I don't know how to add a commit message that isn't a gripe
cut up a 300 line method into 3 100 line methods
nope - it isn't that bad
sorry
Dennis Metzger
@dmetzger57
Jan 12 2016 15:34
@akrzos I’ll get back to you after our standup
Keenan Brock
@kbrock
Jan 12 2016 15:35
@akrzos log rotate has had issues in the past due to selinux privs. it is tricker than it should be
Alex Krzos
@akrzos
Jan 12 2016 15:44
@dmetzger57 / @kbrock ok cool, hopefully it wasn't a new bug there
Dennis Metzger
@dmetzger57
Jan 12 2016 15:52
@akrzos the issue with this machine was two fold, needed a recent fix from Nick in space check and it appears the file file system had some health issues in the recent past. See BZ 1294991 for the needed fix, once applied run logrotate —force
Alex Krzos
@akrzos
Jan 12 2016 15:54
@dmetzger57 checking it out now
Keenan Brock
@kbrock
Jan 12 2016 16:54
I like the selunix relable everything command. huh. forgot it. alberto probably remembers
Keenan Brock
@kbrock
Jan 12 2016 17:23
@jrafanie are you hesitant for inverse_of?
Joe Rafaniello
@jrafanie
Jan 12 2016 17:23
no, keep in mind there are cases you can't do it
I made a few changes to use inverse_of
i think on ExtManagementSystem
there's definitely more inverse_of's to do
Keenan Brock
@kbrock
Jan 12 2016 17:56
@Fryguy think that suggestion saved us another 200ms (trivial) but a few allocations too. thanks
Jason Frey
@Fryguy
Jan 12 2016 17:56
which one?
moving up into the includes?
oh yeah... (I just saw your comment come in on that PR)
Alex Krzos
@akrzos
Jan 12 2016 19:14
@kbrock So I swapped the the 5.5 from sandy bridge to haswell and now the perf_capture_timer times around not timing out at 3600s, looks to be around 2500s-2800s
going to run profiling now
Keenan Brock
@kbrock
Jan 12 2016 19:17
@akrzos don't bother - it stinks. we need to fix
@akrzos if it takes that long, then it is doing bad stuff to the db/worker that will cause other processes to slow down
Alex Krzos
@akrzos
Jan 12 2016 19:22
@kbrock nice, that will save me some pain
Keenan Brock
@kbrock
Jan 12 2016 19:35
@akrzos I mean - I want you to test. but AFTER I make most of my changes :)
Alex Krzos
@akrzos
Jan 12 2016 19:37
:smile: Hopefully I can just patch those appliances and we can see the improvement in real time on grafana
Keenan Brock
@kbrock
Jan 12 2016 20:04
@Fryguy if there is another way you wanted me to make those private, just holler ManageIQ/manageiq#6141 (last commit)
Jason Frey
@Fryguy
Jan 12 2016 20:34
merged @kbrock
Keenan Brock
@kbrock
Jan 12 2016 20:34
ooh - thanks
NOW the fun begins
Jason Frey
@Fryguy
Jan 12 2016 20:34
hah
I wish Ruby knew that if I say private it means I want all class and instance methods from that point on to be private
having to call private_class_method on each one or doing weird stuff like class << self; private; end is just weird
Keenan Brock
@kbrock
Jan 12 2016 20:35
+1
Jason Frey
@Fryguy
Jan 12 2016 20:35
poke @tenderlove :trollface:
Keenan Brock
@kbrock
Jan 12 2016 20:35
lol
Matthew Draper
@matthewd
Jan 12 2016 20:36
def self.foo is a special case of def expr.foo, and applying that rule to the latter seems.. confusing.
Jason Frey
@Fryguy
Jan 12 2016 20:36
def expr.foo is weird to begin with :grin:
but even the latter would make sense if expr evaluated to something, then I'd want private on that something
I guess I'd need to see an example that's confusing...at least moreso than the common case of creating private class methods
TBH, I actually prefer the new way where you can do private def blah (however you still have to do private_class_method def self.blah, which I think would look nicer as private def self.blah, but whatevs)
Daniel Berger
@djberg96
Jan 12 2016 20:51
@Fryguy sorry, what?
Jason Frey
@Fryguy
Jan 12 2016 20:51
LOL
just ranting :P
Daniel Berger
@djberg96
Jan 12 2016 20:51
If you say private, then all instance and class methods under it are private
Jason Frey
@Fryguy
Jan 12 2016 20:51
no, that is not true
Daniel Berger
@djberg96
Jan 12 2016 20:52
Have an example?
Jason Frey
@Fryguy
Jan 12 2016 20:52
yes, one sec
unless something changed recently in Ruby :D
Daniel Berger
@djberg96
Jan 12 2016 20:52
always been that way I thought
Jason Frey
@Fryguy
Jan 12 2016 20:53
[1] pry(main)> class Foo
[1] pry(main)*   private
[1] pry(main)*   def self.test
[1] pry(main)*     puts "Hi"
[1] pry(main)*   end  
[1] pry(main)*   
[1] pry(main)*   def test
[1] pry(main)*     puts "Hi2"
[1] pry(main)*   end  
[1] pry(main)* end  
=> :test
[2] pry(main)> Foo.new.test
NoMethodError: private method 'test' called for #<Foo:0x007fe5608b0680>
from (pry):11:in '__pry__'
[3] pry(main)> Foo.test
Hi
=> nil
Daniel Berger
@djberg96
Jan 12 2016 20:54
Oh, singleton methods....hrm
Jason Frey
@Fryguy
Jan 12 2016 20:54
yah
Daniel Berger
@djberg96
Jan 12 2016 20:54
maybe private should take a block :)
Jason Frey
@Fryguy
Jan 12 2016 20:54
the only way to make that class method private is to call private_class_method :test, or do that thing where you open up the singleton class class << self; end
Joe Rafaniello
@jrafanie
Jan 12 2016 20:55
good times ^
Jason Frey
@Fryguy
Jan 12 2016 20:55
POLS is broken in this case, but I guess that's just my opinion, man
Daniel Berger
@djberg96
Jan 12 2016 20:56
I think it's come up before ;)
Chris Arcand
@chrisarcand
Jan 12 2016 21:01
@Fryguy My previous employer started doing 2.1+ private symbol syntax after I argued it was better than the declaration of private :method after every method that was insisted previously. I actually really liked it. http://sportngin.github.io/styleguide/ruby.html#private--protected-methods
Jason Frey
@Fryguy
Jan 12 2016 21:06
@djberg96's reaction of surprise is exactly why I think it violates POLS, btw :D
Daniel Berger
@djberg96
Jan 12 2016 21:13
For FFI, I actually just made a wrapper, attach_pfunc
Forgot about that