Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Todd Young
    @yngtodd
    Hey MillionIntegrals, thanks for setting this up.
    Million Integrals
    @MillionIntegrals
    Hi, I need to do one thing but I'll be back in like an hour
    Todd Young
    @yngtodd
    No rush! I will be offline temporarily around that time, but will be around in a couple hours. If that is too late, then we can catch up another time.
    Million Integrals
    @MillionIntegrals
    ok, I'm back
    can you send me your checkpoint files to my email?
    I'll see if they work for me
    Todd Young
    @yngtodd
    Oh man, I wish I saw this earlier. I have those checkpoints saved on my computer at work, and I can't currently ssh to that machine. I can send that to you first thing in the morning!
    Todd Young
    @yngtodd
    Sent you a copy of the checkpoint files. I also realized that I had pushed them up to GitHub here: https://github.com/yngtodd/vel/tree/master/output/checkpoints/breakout_a2c/0
    Todd Young
    @yngtodd
    Hey MillionIntegrals, I think I'd like to have a crack at the VecEnv for the DQN models. You had mentioned that the sampling would be slightly different from that of the ReplayBuffer used by ACER. What did you mean by that?
    Million Integrals
    @MillionIntegrals
    Hi Todd
    let me give you a bit of introduction
    if you look a bit in the internals
    the default so called "environment roller" that is the class that is responsible for creating environment rollouts for Q learning is currently called DequeReplayRollerEpsGreedy
    Deque because the replay buffer is just a circular buffer that gets overwritten as the new experience gets written
    Million Integrals
    @MillionIntegrals
    EpsGreedy, because e-greedy action sampling is happening in the environment roller
    it is still up to the debate whether that should be done in the roller or in the model, but yeah, for now that's where it is
    sampling from the buffer is done in the method sample
    the DequeReplayRollerEpsGreedy samples transitions from the replay buffer one by one, uniformly
    using method sample_batch_uniform from the buffer "backend"
    if you looks for the 'vectorized' replay buffer for the ACER
    Million Integrals
    @MillionIntegrals
    that class is called ReplayQEnvRoller
    in the method sample you can see that it samples full trajectories
    using method sample_batch_rollout
    which samples continuous trajectories instead of individual transitions
    so what needs to be done
    there needs to be created a new environment roller
    for vectorized environments
    that would sample transitions using sample_batch_uniform rather than sample_batch_rollout
    of the underlying buffer backend
    I'm not sure if I'm super clear
    because it goes quite deep in the internals
    but let me know
    the more I'm reading this code the more I'm thinking it needs a bit deeper rework, so maybe it's just good to start doing this rewrork....
    Todd Young
    @yngtodd
    Great! Thanks for introduction, I am going to get a bit more familiar with everything, and I will let you know if I have any questions!
    Todd Young
    @yngtodd
    Correct me if I'm wrong, but the buffer backend would not need to change, correct?
    Ideally the vectorized env_roller would be able to work with both the DequeBufferBackend and thePrioritizedReplayBackend. I was thinking of how the Rainbow DQN could be added as well.