These are chat archives for ManageIQ/manageiq/performance

19th
Jan 2016
Joe Rafaniello
@jrafanie
Jan 19 2016 14:07
cool, @akrzos, let me know if you were able to set it up, I'd like to view the logs :smile:
Alex Krzos
@akrzos
Jan 19 2016 14:08
@jrafanie I did
I'll send you the ip
Joe Rafaniello
@jrafanie
Jan 19 2016 14:08
I'm not expecting system-wide memory saving yet, we don't preload much (basically just the rails app code is shared)
Alex Krzos
@akrzos
Jan 19 2016 14:09
I ran a few "idle" memory tests to compare it to 5.5.2.1
I had forgot to run an idle memory test before checking out the forking code
so not sure if the two reference points can even be compared but I just spawned another master appliance so I intend on running a idle memory tests against that one as well
so I have a good reference to compare to
Joe Rafaniello
@jrafanie
Jan 19 2016 14:10
ok
Alex Krzos
@akrzos
Jan 19 2016 14:11
but I didn't see much savings ~3.4GiB memory used at idle with all roles enabled
so i wasn't sure if i had do something wrong yet either
Joe Rafaniello
@jrafanie
Jan 19 2016 14:11
Right, you won't see much savings until we eager load the gems and more of the miq code pre-fork
pss is around 50 MB lower than rss though which is what I expect (basically the size of rails and required gems)
at this point, I'm concerned with any errors or workers restarting due to memory limits
I see you installed smem
I need to add that to the upstream appliance
@akrzos are these expected in that environment: ERROR -- : MIQ(MiqReplicationWorker::Runner#start_replicate) MIQ(MiqReplicationWorker::Runner) Replication configuration is invalid
Alex Krzos
@akrzos
Jan 19 2016 14:15
@jrafanie Yes, there is no master host configured
My automation simple enables every role
and measures memory of an idle no provider appliance of the course of an hour
I was hoping to see if the biggest impact on this test since no provider specific data would be loaded or allocated
Joe Rafaniello
@jrafanie
Jan 19 2016 14:17
interesting...
This message was deleted
oh wait, that's not hte first one...
Alex Krzos
@akrzos
Jan 19 2016 14:19
I had to run a bundle install since there was a missing gem too fyi
Joe Rafaniello
@jrafanie
Jan 19 2016 14:19
ok
ok
So, I don't see that error after the workers were killed and things were restarted
Alex Krzos
@akrzos
Jan 19 2016 14:21
I did reboot the entire box as well after the first run measuring idle memory
Joe Rafaniello
@jrafanie
Jan 19 2016 14:23
it looks ok, did you want to add a provider?
Alex Krzos
@akrzos
Jan 19 2016 14:23
Sure
I'll add the small provider to start with
Alex Krzos
@akrzos
Jan 19 2016 14:29
so provider added
I'll turn on c&u so it has some work to do as well
stuck on a spinning wheel for C&U collection
Joe Rafaniello
@jrafanie
Jan 19 2016 14:35
how about now? I don't see any problems in the logs
Alex Krzos
@akrzos
Jan 19 2016 14:36
Yeah no error appears just get spinning wheel
Joe Rafaniello
@jrafanie
Jan 19 2016 14:37
which screen? enabling it?
Alex Krzos
@akrzos
Jan 19 2016 14:37
tries firefox
Configure -> Configuration -> Region 0 on accordion -> C&U Collection tab to enable per cluster/datastore
spinning wheel in firefox as well
Anyone know what official repo I can get ruby222-ruby-devel from?
Alex Krzos
@akrzos
Jan 19 2016 14:42
It's not in rhel-7-server-rpms, rhel-server-rhscl-7-rpms or rhel-7-server-optional-rpms
Joe Rafaniello
@jrafanie
Jan 19 2016 14:44
hmm, I would have thought it would be in the same repo as the *-ruby package
Joe Rafaniello
@jrafanie
Jan 19 2016 14:50
@akrzos you might be able to find it in a copr repo, like this? https://copr.fedoraproject.org/coprs/rhscl/rh-ruby22-el7/
Alex Krzos
@akrzos
Jan 19 2016 15:18
Found it "rh-ruby22-ruby-devel"
Joe Rafaniello
@jrafanie
Jan 19 2016 15:36
makes sense ¯_(ツ)_/¯
Keenan Brock
@kbrock
Jan 19 2016 15:39
aaah
@akrzos so your name is "Alex Cross"? /via standup @jrafanie
Joe Rafaniello
@jrafanie
Jan 19 2016 15:40
lol
Alex Krzos
@akrzos
Jan 19 2016 15:40
Thats how you say it
Jason Frey
@Fryguy
Jan 19 2016 15:40
No, it's "Ah-Ker-Zohss"
;)
Joe Rafaniello
@jrafanie
Jan 19 2016 15:40
it's a silent Z ;-)
Keenan Brock
@kbrock
Jan 19 2016 15:40
:)
man - I have no idea how many different ways I've pronounced your name
Joe Rafaniello
@jrafanie
Jan 19 2016 15:40
words are fun
Keenan Brock
@kbrock
Jan 19 2016 15:41
but finally, your name just becamse "alex - the performance guy down in Raleigh"
Joe Rafaniello
@jrafanie
Jan 19 2016 15:41
names are more fun
Alex Krzos
@akrzos
Jan 19 2016 15:41
haha
Keenan Brock
@kbrock
Jan 19 2016 15:41
or "alex - the scarry smart performance guy"
Alex Krzos
@akrzos
Jan 19 2016 15:41
Alex Cross the detective
Joe Rafaniello
@jrafanie
Jan 19 2016 15:41
I have a silent I in my last name, at least how us american Italians pronounce it ;-)
Keenan Brock
@kbrock
Jan 19 2016 15:41
When I said "scarry smart" they seemed to know exactly who I was talking about
Oleg Barenboim
@chessbyte
Jan 19 2016 15:41
@kbrock - you mean scary, not scarry? Although one can be scary if they are scarry
Jason Frey
@Fryguy
Jan 19 2016 15:41
Now I picture you with a pipe and a magnifying glass
Keenan Brock
@kbrock
Jan 19 2016 15:42
heh
Joe Rafaniello
@jrafanie
Jan 19 2016 15:42
it's more like "don't cross Alex"
Keenan Brock
@kbrock
Jan 19 2016 15:42
ha
Joe Rafaniello
@jrafanie
Jan 19 2016 15:42
or you'll be cross
Keenan Brock
@kbrock
Jan 19 2016 15:42
@chessbyte spilling is hard
Alex Krzos
@akrzos
Jan 19 2016 15:52
Well I got stackprof running on 5.5.0.13-2
Joe Rafaniello
@jrafanie
Jan 19 2016 16:31
@akrzos I fixed a minor bug in my forking_workers branch, deleted a few more things and squashed some commits, can I stop your processes and update?
Alex Krzos
@akrzos
Jan 19 2016 16:34
@jrafanie got it
Joe Rafaniello
@jrafanie
Jan 19 2016 16:34
how about now?
I just re-ordered a commit's change... sorry
Keenan Brock
@kbrock
Jan 19 2016 16:35
github + reorder commit != fun
Alex Krzos
@akrzos
Jan 19 2016 16:35
Trying right now
if I can get evmserverd to stop
(I'll make it stop)
:hammer:
Joe Rafaniello
@jrafanie
Jan 19 2016 16:36
@kbrock yeah, I know ;-)
Keenan Brock
@kbrock
Jan 19 2016 16:36
either stop or i'll give you a reason to stop
Joe Rafaniello
@jrafanie
Jan 19 2016 16:36
I love how I keep finding stuff to delete once forking workers is the way...
Keenan Brock
@kbrock
Jan 19 2016 16:38
+1
I'm finding more and more region stuff to delete :)
Oleg Barenboim
@chessbyte
Jan 19 2016 16:38
@kbrock are those shiny objects or related to performance?
Keenan Brock
@kbrock
Jan 19 2016 16:39
shiny - well. shiny because they are preventing me from running performance -
but @chessbyte yes - thanks. point taken and true
Oleg Barenboim
@chessbyte
Jan 19 2016 16:39
@kbrock remember your current mission is to make ManageIQ more performant and scalable
Alex Krzos
@akrzos
Jan 19 2016 16:40
@jrafanie It's running now
Last commit in the git log is still from monday
I assume you did a force update then
Joe Rafaniello
@jrafanie
Jan 19 2016 16:41
yes, you should be at 9959e8f2aa104d11ac93c7d15d6008708d297bf6
Alex Krzos
@akrzos
Jan 19 2016 16:41
yup there
Joe Rafaniello
@jrafanie
Jan 19 2016 16:41
Great, thanks
I changed the proctitle prefix to MIQ:
we can change that to whatever we finally decide
I needed to make the MiqWorker.is_worker? work properly for the provider workers
Oleg Barenboim
@chessbyte
Jan 19 2016 16:43
@jarafanie - is it possible to make that different on upstream vs downstream (MIQ vs CF)
Joe Rafaniello
@jrafanie
Jan 19 2016 16:44
I'd rather not have a bikeshed on that in the forking workers PR...I just need to have the workers detection work in this PR
Oleg Barenboim
@chessbyte
Jan 19 2016 16:44
or maybe we should not do that
Joe Rafaniello
@jrafanie
Jan 19 2016 16:44
@chessbyte we should use something that works in both
Alex Krzos
@akrzos
Jan 19 2016 16:45
@jrafanie Ok looking at the refresh worker looks like we have ~44MiB of shared memory
RSS - USS
Joe Rafaniello
@jrafanie
Jan 19 2016 16:45
@chessbyte I don't see what that we gain for that complexity
@akrzos yes, that's basically the rails/gems/core miq code that's not unchanged in the child processes
I have some code to eager load many more things later on
we have to have a strategy for eager load
I'm assuming we don't want to blindly do eager load as that will cause developers consoles, tests, rails s, etc. to load slower
we probably want to do what eager_load manually does in the before_fork hook for the workers
same with requiring the gems that are :require => false (in the before_fork hook)
Keenan Brock
@kbrock
Jan 19 2016 16:49
+1
well, that is probably a PR in and of itself
Joe Rafaniello
@jrafanie
Jan 19 2016 16:50
yes
Keenan Brock
@kbrock
Jan 19 2016 16:50
@jrafanie if we show NO improvement
(because we didn't tune)
should we merge anyway?
if the answer is yes (which is where I am) - then that may make it easier for us to separate forking from tuning
Joe Rafaniello
@jrafanie
Jan 19 2016 16:51
the forking PR just does basic sharing of the app so the memory savings are probably very small, we'll get memory savings in the future via the above mechanisms
the forking PR drops worker start from 45 seconds to 15 locally in an appliance, that's probably the biggest PRO so far
Keenan Brock
@kbrock
Jan 19 2016 16:51
good - so then you said, we will go forward with forking if it works, even if we don't save anything in terms of memory
... getting rid of primordial seeds ...
can we move that to servers (or scrap all together?)
Oleg Barenboim
@chessbyte
Jan 19 2016 16:52
@kbrock - we are saving TIME on starting workers
Joe Rafaniello
@jrafanie
Jan 19 2016 16:52
that's a shiny object ;-)
Keenan Brock
@kbrock
Jan 19 2016 16:52
@jrafanie hmm - but a customer was saying it took a minute
@chessbyte cool
Oleg Barenboim
@chessbyte
Jan 19 2016 16:52
even with no memory changes
Joe Rafaniello
@jrafanie
Jan 19 2016 16:53
we'll get to that when it's a performance/logical problem we can measure and improve
Keenan Brock
@kbrock
Jan 19 2016 16:53
good - so why are we testing memory for this pr?
should we just be testing that it works?
Oleg Barenboim
@chessbyte
Jan 19 2016 16:53
I have no idea why we are testing memory for this PR
Joe Rafaniello
@jrafanie
Jan 19 2016 16:53
I think @akrzos is just curious as am I
Oleg Barenboim
@chessbyte
Jan 19 2016 16:54
@jrafanie what is preventing us from merging this PR?
Keenan Brock
@kbrock
Jan 19 2016 16:54
@jrafanie I'm excited too. so excited that I'm saying :shipit: and THEN lets do tuning
Joe Rafaniello
@jrafanie
Jan 19 2016 16:54
@gtanzillo and I wanted to have @akrzos vet the provider based workers work fine
Oleg Barenboim
@chessbyte
Jan 19 2016 16:54
@jrafanie thanks for the info
Keenan Brock
@kbrock
Jan 19 2016 16:55
@jrafanie thanks
Joe Rafaniello
@jrafanie
Jan 19 2016 16:55
I did local workers but didn't get a good run with cap and u, event etc workers
let me add smem rpm to the base appliance... since top will be lying to us once we merge forking workers
Joe Rafaniello
@jrafanie
Jan 19 2016 18:59
hey @/all , care to throw :tomato: RE: forking workers talk.manageiq.org discussion...
https://github.com/ManageIQ/manageiq/pull/3593

Summary:

When launching rake evm:start (via evmserverd systemctl or directly):
* workers will be created via Process.fork instead of Process.spawn
  * bypassing all of the rails + vmdb initialization since it's already loaded in the parent
  * workers will share memory with the server process

Benefits:
* Faster worker startup including much less IO contention
* Potential for future memory savings by preloading code, gems, libraries, etc. before we fork

What do I have to do?
* git pull
* rake db:migrate

What changes should I expect?
* miq_workers command_line column is gone
* MiqEnvironment::Process is removed
* lib/vmdb/initializer.rb worker init code has been moved to the workers
* top and system utilities that measure RSS will report more process memory
usage since they don't accurately report shared memory
  * use smem and our logging of the proportional_set_size instead
* the existing worker memory limits code now uses PSS instead of RSS (linux only)
did I miss anything?
Daniel Berger
@djberg96
Jan 19 2016 19:06
Hm, did you add Windows support?
j/k
Joe Rafaniello
@jrafanie
Jan 19 2016 19:06
lolz
yes, when they implement fork in windows, it'll just work
you missed an opportunity to use :trollface:
Daniel Berger
@djberg96
Jan 19 2016 19:10
:trollface:
ok, now i know how to make that :)
Jason Frey
@Fryguy
Jan 19 2016 19:10
better late than never :trollface:
Joe Rafaniello
@jrafanie
Jan 19 2016 19:10
ok, now it's been used too much
Greg Blomquist
@blomquisg
Jan 19 2016 19:10
:trollface: what has?
Joe Rafaniello
@jrafanie
Jan 19 2016 19:11
(ಠ_ಠ)
Greg Blomquist
@blomquisg
Jan 19 2016 19:14
@jrafanie I like your summary of forking workers ... can you link to the talk article in the PR (preferably in the PR summary body)
Joe Rafaniello
@jrafanie
Jan 19 2016 19:14
ok
will do
thanks
Gregg Tanzillo
@gtanzillo
Jan 19 2016 19:37
@jrafanie Just looked at your summary and it looks good. Thanks :+1:
Joe Rafaniello
@jrafanie
Jan 19 2016 19:39
@kbrock what's that in regards to?
Keenan Brock
@kbrock
Jan 19 2016 19:39
how does postgres work at all
Joe Rafaniello
@jrafanie
Jan 19 2016 19:39
lol
Dennis Metzger
@dmetzger57
Jan 19 2016 20:48
@jrafanie that link was for my bed time reading
Joe Rafaniello
@jrafanie
Jan 19 2016 20:48
@dmetzger57 :laughing:
Joe Rafaniello
@jrafanie
Jan 19 2016 21:39
@akrzos what do you think of the forking workers in terms of "breakage" / stability?
I understand we'll have to improve shared memory... I'm trying not to boil the ocean with all the shiny things I want to do in this initial PR ;-)
Joe Rafaniello
@jrafanie
Jan 19 2016 21:51
@matthewd @Fryguy should I care about preload_for_worker_rolebeing called more than once per worker class? I was going to cache it but it should be a no-op the second time if you require all the code the worker needs on the first call
I was going to add a class variable so it's only done once ever no matter how many times start UI and web service workers...but caching this feels more complicated than needed
Jason Frey
@Fryguy
Jan 19 2016 21:53
I wouldn't worry about it getting called more than once
Joe Rafaniello
@jrafanie
Jan 19 2016 21:53
ok
Jason Frey
@Fryguy
Jan 19 2016 21:53
requires should be basically noops
only if you have something in there that really can't run twice
Joe Rafaniello
@jrafanie
Jan 19 2016 21:54
yeah, I'll check the other things in there, make them a no-op on a subsequent call
thanks
Matthew Draper
@matthewd
Jan 19 2016 21:55
Maybe just add a comment to the method, noting it will be called multiple times?
Joe Rafaniello
@jrafanie
Jan 19 2016 21:55
yeah, i will
Jason Frey
@Fryguy
Jan 19 2016 21:55
it's up to you
Richard Oliveri
@roliveri
Jan 19 2016 21:58
@jrafanie How do the workers share memory with the server?
Joe Rafaniello
@jrafanie
Jan 19 2016 21:59
using fork
Richard Oliveri
@roliveri
Jan 19 2016 21:59
Fork duplicates the callers memory, doesn’t it?
Jason Frey
@Fryguy
Jan 19 2016 21:59
no
Joe Rafaniello
@jrafanie
Jan 19 2016 21:59
no
Jason Frey
@Fryguy
Jan 19 2016 21:59
it's copy-on-write
Joe Rafaniello
@jrafanie
Jan 19 2016 22:00
at least with ruby 2.0+ it's CoW friendly
Richard Oliveri
@roliveri
Jan 19 2016 22:01
Ah, then it’s not really shared in the sense that on can read a value that the other writes.
Keenan Brock
@kbrock
Jan 19 2016 22:01
but ruby before 2.0 is a cow
Jason Frey
@Fryguy
Jan 19 2016 22:01
@roliveri correct
Keenan Brock
@kbrock
Jan 19 2016 22:01
@roliveri correct
Joe Rafaniello
@jrafanie
Jan 19 2016 22:01
Yes
@roliveri Good luck telling the kernel developers to not use the term shared everywhere though
Richard Oliveri
@roliveri
Jan 19 2016 22:02
The kernel developers I’ve worked with wouls have never have called it shared.
But the COW memory is cool.
Keenan Brock
@kbrock
Jan 19 2016 22:03
I guess it is shared by definition of RSS (resident shared memory) - right?
Joe Rafaniello
@jrafanie
Jan 19 2016 22:04
it's even worse when you look at /proc/PID/smaps
resident set size
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Alex Krzos
@akrzos
Jan 19 2016 22:07
@jrafanie I can't really say in terms of stability, I'd like to run a workload or two against it. That will be difficult as running the qe testing harness against master probably won't end well
Joe Rafaniello
@jrafanie
Jan 19 2016 22:08
@akrzos so, I hear you saying MERGE IT and TEST more later :wink:
Richard Oliveri
@roliveri
Jan 19 2016 22:08
In terms of p/v memory consumption, yes it’s shared, but process shared memory has a specific meaning defined decades ago - it’s a form of IPC that can be used to pass data between processes. Calling this shared memory confises the issue.
Keenan Brock
@kbrock
Jan 19 2016 22:08
@jrafanie +1
Matthew Draper
@matthewd
Jan 19 2016 22:08
+1 :shipit:
Jason Frey
@Fryguy
Jan 19 2016 22:08
haha :+1:
Alex Krzos
@akrzos
Jan 19 2016 22:10
@jrafanie :trollface:
Richard Oliveri
@roliveri
Jan 19 2016 22:13
@jrafanie Do we ever set or reset worker environment variables on the command line through the spawn call?
Joe Rafaniello
@jrafanie
Jan 19 2016 22:17
If you're speaking of the "current way", not that I recall @roliveri
With our current spawn technique, I believe we inherit the environment from the server and pass along options to the individual classes
Richard Oliveri
@roliveri
Jan 19 2016 22:21
@jrafanie - I know we do it with some processes, like the VDDK server, but that’s not a worker. If we never do it when starting workers, then it should be fine.
Joe Rafaniello
@jrafanie
Jan 19 2016 22:27
Yes, @roliveri sharing the same exact startup code (forked kids don't run the rails initializers or our vmdb initializer) has already required me to move things around to ensure the forked kids do the right thing
including closing PG gems underlying socket FD inherited from the parent
thankfully @matthewd showed me how to do that one
Keenan Brock
@kbrock
Jan 19 2016 22:56
@akrzos do you have a database with any "realtime" metrics in it?
Joe Rafaniello
@jrafanie
Jan 19 2016 23:31
So, @Fryguy @matthewd I'm good with merging forking workers anytime, there's some things I'll do followup PRs for but they're pretty minor so I'd rather not add more to this PR
Tomorrow morning is fine and I'll do the followup PRs
Joe Rafaniello
@jrafanie
Jan 19 2016 23:37
and since tomorrow morning is right now for @matthewd, now's good too
:wave: Good night :moon:
Matthew Draper
@matthewd
Jan 19 2016 23:40
I liked that logic, so I merged it :grinning:
Joe Rafaniello
@jrafanie
Jan 19 2016 23:43
looks like a perfect time to shut down