These are chat archives for ManageIQ/manageiq/performance

9th
Dec 2016
Peter McGowan
@pemcg
Dec 09 2016 16:05
Hi all, a quick question if I may
I have an automate debugging tool called object_walker.
I've seen several examples of object_walker taking the Generic/Priority worker process memory over its limit as it runs (CFME 5.6.x), with the resultant worker termination. This obviously hangs and then terminates object_walker.
I've also had a user report the same issue to me on github (object_walker 'hanging').
Now I can understand that as it's traversing the various associations that it's loading more objects into the automation engine, but I was wondering if anything had changed with 5.6 regarding memory?
Maybe we just have slightly less headroom now between steady running state and the default memory limit.
Joe Rafaniello
@jrafanie
Dec 09 2016 16:08
@pemcg ruby 2.1+ has a generational GC so lots of allocations without the high level object going "out of scope" leads to large growth without immediate cleanup
Peter McGowan
@pemcg
Dec 09 2016 16:09
thanks @jrafanie
so it it worth increasing the out-of-the-box default memory for these workers
?
Joe Rafaniello
@jrafanie
Dec 09 2016 16:09
Some hacks have been found where you reload the base object to clear the "associations"
see: ManageIQ/manageiq#12663
Peter McGowan
@pemcg
Dec 09 2016 16:10
this is a normal user process automate method that's doing the damage
Joe Rafaniello
@jrafanie
Dec 09 2016 16:11
yeah, ruby's gen GC is allocation based so if you have lots of allocations, then things go out of scope, then do very little allocations, those original allocations may not see a FULL GC for a while
Peter McGowan
@pemcg
Dec 09 2016 16:12
should should the automate engine be doing more cleaning up?
Joe Rafaniello
@jrafanie
Dec 09 2016 16:13
is it possible do any of the object_walker code in a simple rails process (outside of automate)?
(for recreation purposes)
If so, it would take us very little time to find the problem area
Peter McGowan
@pemcg
Dec 09 2016 16:14
well it runs as a 'user space' automate method, so I thought that ran outside of the engine?
Joe Rafaniello
@jrafanie
Dec 09 2016 16:14
to your point re: memory thresholds... they're currently soft thresholds, where we let the worker gracefully exit after it's done... if a worker is constantly getting restarted after minimal work, that threshold is too low
there's some discussion to make these thresholds more strict and give a worker much less time to exit before we kill it... even if it's doing useful work
Peter McGowan
@pemcg
Dec 09 2016 16:16
it's certainly repeatable, if I call object_walker from a state in the VM provisoning state machine, it'll fail and the VM provision operation won't continue
Joe Rafaniello
@jrafanie
Dec 09 2016 16:16
Either way, if you're able to reliably make a worker exceed it's threshold in automate, when it doesn't normally, we should try to review your object_walker code to see what's causing it
changing $print_evm_parent = false to $print_evm_parent = true will exceed the worker threshold during a VM provision
Joe Rafaniello
@jrafanie
Dec 09 2016 16:20
What does that do? If you walk all associations for each object in a hierarchy from a single object, that would certainly do it
Peter McGowan
@pemcg
Dec 09 2016 16:21
hmm
I'm tempted to suggest that we shouldn't really let any automate user write a script that when run takes a worker over its memory limit.
that's like a DoS attack :-)
given that the worker could be processing other automate operations as well
Joe Rafaniello
@jrafanie
Dec 09 2016 16:23
@pemcg you could instrument your code with pretty low overhead GC information to help you narrow it down...
Peter McGowan
@pemcg
Dec 09 2016 16:23
sure
Madhu Kanoor
@mkanoor
Dec 09 2016 16:24
He is running as a DRb Client Method
Joe Rafaniello
@jrafanie
Dec 09 2016 16:24
log GC.stat and ObjectSpace.count_objects in many places would help
Jason Frey
@Fryguy
Dec 09 2016 16:24
it's not hard to add a sleep(100000) to any automate method anyway :laughing:
Madhu Kanoor
@mkanoor
Dec 09 2016 16:24
It wont be on the engine side
Joe Rafaniello
@jrafanie
Dec 09 2016 16:24
@mkanoor that's why I want to run it in a runner script
outside of automate
Peter McGowan
@pemcg
Dec 09 2016 16:26
@jrafanie not sure that would be valid, as @mkanoor said it's a DRb client automate method so relies on the $evm handle setup by the engine
Joe Rafaniello
@jrafanie
Dec 09 2016 16:27
yeah, I'm not sure what's preventing us from creating an interface that rails console/runner could implement in the same way the DRb engine does
Peter McGowan
@pemcg
Dec 09 2016 16:28
@Fryguy wouldn't that just hang my thread?
Joe Rafaniello
@jrafanie
Dec 09 2016 16:29
So, yeah, without it, you'd have to convert your walker script to something you can run in ruby/rails outside of automate to figure out what's blowing up the memory
but loading associations into memory while traversing a tree of objects will certainly consume memory... it might just be a handful of associations that you don't really need, but we won't know until we profile it
Madhu Kanoor
@mkanoor
Dec 09 2016 16:30
can we add a objectspace call in the miq_ae_service on the engine side that his script could call
$evm.dump_object_space
Peter McGowan
@pemcg
Dec 09 2016 16:31
@mkanoor that sounds interesting
Joe Rafaniello
@jrafanie
Dec 09 2016 16:35
yes, I don't see why you couldn't do that for lightweight things like GC.stat and ObjectSpace.count_objects. If it's easy, it's worth a shot to see if it helps identify problem areas.
Keenan Brock
@kbrock
Dec 09 2016 20:46
doesn't automate allow you to call an automate method in process?
in theory we could call that script via a direct automate call (we'd instantiate it from rails console, but it would be calling in memory - think we'd have more control
but maybe that calling method (direct?) has gone away
Madhu Kanoor
@mkanoor
Dec 09 2016 20:47
the automate method make calls on the service model objects which wrap the real AR objects