by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Richard Li
    @szrlee
    I am willing to contribute our algorithm "DAPO" to Tianshou framework. https://github.com/lns/dapo
    where we need to construct specialized the replay buffer.
    is there any document about the replay buffer?
    Richard Li
    @szrlee
    in short, we need to store (s_t,at,\pi{behavior}(a_t|st), s{t+1}) and may need multi-step bootstrapping.
    n+e
    @Trinkle23897
    Thank you! Currently I have not added the documentation but maybe this week it will be finished.
    I think currently the easiest way is to add a framestack env wrapper to store multistep observation. There is a wrapper in tianshou.env.FrameStackWrapper(maybe this name, you can check it out)
    And all of the frames store in the replay buffer are in the order of time. You can also look at the implementation of n_step DQN
    n+e
    @Trinkle23897
    For a_t, you can compute when calling learn() function
    Richard Li
    @szrlee
    thanks. we will try.
    Yizheng Zhang
    @Privilger
    Hi, can anyone tell me how to make the PPO output action becomes deterministic? Thanks in advance