In which function should I implement a temporal-difference-style update? I thought step! might be where to do it, but it doesn't take a subtype of AbstractPolicy as an argument, and I've got my value function in my Policy type. And doing it with the s, a, r, s' returned from run_episode doesn't work, because I want to update during the episode. I've been looking through code files but just cannot seem to find an update on a policy.
Figured it out - it goes outside any of those, in some loop where I'm calling step! and action etc. etc.