These are chat archives for ManageIQ/manageiq/performance

18th
Oct 2017
Keenan Brock
@kbrock
Oct 18 2017 14:03
Generational GC divides a heap space into several spaces for several generations (in Ruby's case, we divide heap space into two: one "young" and one "old" space). Newly created objects are located in the young space and labeled as "young object". After surviving several GCs (3 for Ruby 2.2), young objects will be promoted to "old objects" and located in the "old space".
In object oriented programming, we know that most objects die young. Because of this we only need to run GC on the “young space”. If there is not enough space in the young space to create new objects, then we run GC on the “old space”. We call "Minor GC" when GC runs only in the young space. We call "Major GC" for GC that runs in both young and old spaces.
-- https://blog.heroku.com/incremental-gc 2/3/15
Added to the top - important number is the "3" GCs promote from young to old. But it is not the location in memory, I think they are only talking about the slots that they use to store the pointers to mark/sweep
so promoting from young to old does not fix external memory fragmentation (when you have free memory but can't use it because there isn't a slot big enough for you to use, so you have to allocate more)
Joe Rafaniello
@jrafanie
Oct 18 2017 15:11
@kbrock I'm not sure what you're saying regarding the "age" field in the header of objects on the heap
https://github.com/ko1/nakayoshi_fork GCs 3 times before fork in hopes to promote objects to be "old" before forking.... as a way to be more copy on write friendly
It doesn't really work so well though since any writes to an os page, including freeing memory, will cause the whole page to be copied, including any shared memory
nakyoshi_fork was needed because promoting an object to "old" age would directly write to the object's header causing a CoW fault, triggering a copy of the OS page containing that object
Joe Rafaniello
@jrafanie
Oct 18 2017 15:19
So, the young -> old object "write" doesn't trigger a CoW fault with nakayoshi_fork but any other writes do. It doesn't actually relocate old and young objects, so there's lots of fragmentation and mixing of object ages leading to young object GCs causing CoW faults forcing copying of the old objects next to it too.
This is why I believe we hit the issue with the server > 1 GB forking workers that exceed memory thresholds quickly. There's so much shared memory to accidentally touch by GC'ing neighboring objects.
Keenan Brock
@kbrock
Oct 18 2017 15:21
@jrafanie we're trying to find the memory leaks in generic worker. but it seems to be for arbitrary messages going over the wire.
the 3 statement means that the process that allocates / populates a cache will not show up as the culpret - instead a few messages later will
I had assumed that promoting from young to old would group the ones together - well established that I was wrong. But this lack of compaction will essentially always make COW a bad idea. [for the reasons you mentioned]
it is frustrating that the memory usage/bloat follows "obvious patterns", but when you go closer, the pattern isn't there.
Joe Rafaniello
@jrafanie
Oct 18 2017 15:23
I'm still not understanding what you mean by the "3 statement"
Keenan Brock
@kbrock
Oct 18 2017 15:23
I allocate memory that will stay around forever
and you instrument the code around me
it won't show that I added old stuff
a few MiqQueue messages later show up as the culpret - who's instrumentation shows that IT was the one to increase "old" objects
Joe Rafaniello
@jrafanie
Oct 18 2017 15:24
culprit of what? Leaking references? I'm not sure how the 3 GC to promote as "old" comes into play here
I don't think it's worthwile looking at "old" vs. "young" objects unless you're 100% positive we're not triggering full GC
even then, if a full GC cleans things up, it's not leaking memory or references
I think looking at the number of live objects is more important
Keenan Brock
@kbrock
Oct 18 2017 15:29
@jrafanie so you think memory growing consistently over time is live objects - we are allocating more and more?
Joe Rafaniello
@jrafanie
Oct 18 2017 15:30
@kbrock that's one way
if memory usage is rising and live objects isn't, then it's something else, maybe thread leaks, or the dreaded MALLOC_ARENA_MAX or something else much lower level than ruby
Maybe we should add logging of GC stat information and thread info for all processes