These are chat archives for ManageIQ/manageiq/performance

15th
Aug 2017
Felix Dewaleyne
@FDewaleyne
Aug 15 2017 12:26
heya, I'm investingating a possible memory leak in the report worker, I'm wondering how to turn up the debugging in a useful way
Nick LaMuro
@NickLaMuro
Aug 15 2017 14:37
a loaded question that I can try and answer in a few after I am done with standup
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:11

@FDewaleyne Okay, so to start off, I personally use something I have been writing over the past 6 months or so, https://github.com/ManageIQ/manageiq-performance , to get extended insight into what is being reported on.

In my case, where I was looking into a memory leak regarding the MetricsCollector, I was testing against an appliance (versus on my local machine), and wanted to also run with memory_profiler and stackprof (didn't actually use this). Because of that, I needed two [WIP] PRs on that project for it to work:

  • ManageIQ/manageiq-performance#25
  • ManageIQ/manageiq-performance#33

The first PR (still finishing some things up with it) allows installing extra gems into the "appliance_installation_script", which was needed to allow me to install memory_profiler in addition to manageiq-performance on the appliance. This part of that script can already be generated from master on that project by doing the following:

$ rake generate_install_script

But with that PR, you can do things like this:

$ rake extra_gem[memory_profiler] generate_install_script

to install additional gems with the install script.

Felix Dewaleyne
@FDewaleyne
Aug 15 2017 15:12
@NickLaMuro thanks, I'm going to save this and use it when I can :)
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:12
The install script that is generated is a single ruby script that includes the packed gems as base64 encoded strings, and only needs to be run via bundle exec ruby [GENERATED_SCRIPT] (no http access or git binary required), and injects the gems into the appliances gem bundle
The second PR then wraps the MiqQueue#deliever method (when configured), making it so that every job is then profiled while it is being processed. I used it in conjunction with the memory middleware plugin from manageiq-performance to see if any jobs were causing the leak.
In my case, no single job type seemed to be the cause, so I started taking a sample job (after shutting down the MetricsCollector worker), and ran them manually using memory_profiler to see what objects were being generated, and what was being retained
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:18

This ended leading to 4 PRs that I hope (:fingers_crossed:) will help with the memory leak:

  • ManageIQ/manageiq#15757
  • ManageIQ/manageiq#15791
  • ManageIQ/more_core_extensions#54
  • ManageIQ/more_core_extensions#55

But some long term testing on an appliance with those patches in place is needed to be sure

Felix Dewaleyne
@FDewaleyne
Aug 15 2017 15:18
so far for this issue it looks like it is more of a huge spike in memory consumption that is likely due to how we process the data
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:19
yeah, that seems like a reasonable assumption, so probably the MiqQueue wrapper stuff will be more helpful to you
@FDewaleyne Let me know if you have any troubles with any of this, as I have been pretty poor at adding documentation as I add features
Felix Dewaleyne
@FDewaleyne
Aug 15 2017 15:21
I'll let you know if I end up using that - likely I would have to use a reprod or help a customer deploy it in pre-prod
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:23
for my BZ, we could thankfully replicate with a VMWare VC simulator (and I didn't have to set it up ;) )
Felix Dewaleyne
@FDewaleyne
Aug 15 2017 15:24
nice
I'm interested in that vmware vc simulator now that I heard of it :)
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:26
talk to @dmetzger57 about that, he gets the credit for setting that all up (Adam G. has probably also done it as well)
that said, this might be customer specific, so I think your original thought about this on doing this with customer data might be ideal (if possible)
Felix Dewaleyne
@FDewaleyne
Aug 15 2017 15:50
I'm interested for various reasons tied to how I have to work on some reproducer issues.... but nothing right now
I expect probably in the future though
Nick LaMuro
@NickLaMuro
Aug 15 2017 15:58
it definitely is helpful if you know what quirks caused the report to behave slowly that was reported in the BZ, because then you should be able to replicate
(with the VC simulator)
Nick LaMuro
@NickLaMuro
Aug 15 2017 16:27

@FDewaleyne something I just noticed that also exists:

https://github.com/ManageIQ/manageiq/blob/master/app/models/miq_worker/runner.rb#L450-L461

@jrafanie might be able to tell you more on how it works, but wouldn't require anything but changing some configs on an appliance

Joe Rafaniello
@jrafanie
Aug 15 2017 17:33
yeah, @NickLaMuro that runner logging of ruby object usage might be helpful for monitoring general workflow bloat but for reporting, it's probably going to say there's lots of hashes, strings, and arrays