Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    222464
    @222464
    rewards should still be provided if you can, but are not strictly necessary
    Blimpyway
    @Blimpyway
    ok thanks. I'll try a new python env following days for the gigalight branch
    222464
    @222464
    in the quadruped demo, there was no reward provided (set to 0)
    alright cool, let me know if you have any problems!
    Blimpyway
    @Blimpyway
    One more curiosity - do the layers in a hierarchy need to be the same shape? Can first be 3x3x16 and second 4x4x20 ? Assuming radiuses allow each layer to see all columns
    Or if it makes sense, not necessarily that implementation doesn't complain
    222464
    @222464
    it is definitely possible and makes sense in some cases!
    for instance, for character prediction (text generation), it makes sense to increase the size of the layers as the layers go up so that higher layers have more capacity (since they will cover a larger temporal receptive field)
    for lower layers, there isn't much to encode if your input is a single character at a time, so they can be small
    Blimpyway
    @Blimpyway
    Ok, thanks it makes sense. Depending on the problem.
    Blimpyway
    @Blimpyway
    Now when it installs gigalight PyAOgmaNeo/setup.py reports pyaogmaneo version 1.1.5 while normal pip installs report 1.3.1 - is this right?
    222464
    @222464
    @Blimpyway yes, the versions are meaningless outside the master branch
    Blimpyway
    @Blimpyway
    ok thanks. I have a couple more thoughts , what happens if the time slice becomes so short that most of the times, any action doesn't apparently change anything in the state of the environment? I mean given the stepped nature of the encoder it can not perceive variations under certain resolution. I guess the upper layer will eventually react on a coarser time step, but the questions are - would the lower layer have significant value? e.g. would the hierarchy figure on its own to "width modulate" its output? I mean the cartpole model actions can go only full power left and full power right. 10x finer time granularity would allow some upper layer to command (equivalent of) "20% left" or even "do not move" and lowest layer switch quickly between full left and full right accordingly.
    222464
    @222464
    @Blimpyway that seems more like a general reinforcement learning question, RL agents can indeed figure out to width modulate their actions to approximate something continuous in my experience, especially with good exploration (such as Boltzmann exploration). Epsilon greedy might have trouble finding that solution, as it sticks with one action by default. AOgmaNeo uses Boltzmann exploration, and I have observed it performing such "width modulation" before, such as in the Donkey Car Gym environment (where I reduced the number of actions and instead made the steering interpolate slowly).
    Blimpyway
    @Blimpyway
    Thanks again.. Regarding timing in SPH_Presentation.pdf at Working Memory page it states each layer has no memory past 1 time step and longer time processing is done by the upper layers. Yet in the source the initRandom() function gets an argument called "historyCapacity = 64" - is the pdf outdated and it does track more steps instead? There-s also a setAHistoryIters(1,16) which also suggests that.
    Blimpyway
    @Blimpyway
    I'm asking this because the model (Hierarchy) does not receive a signal a round has ended and a new round begins. If it accounts previous N timesteps to adjust its decision that would be a problem because there is no actual correlation between the last few steps of a finishing round and the fist few ones of the starting round. There are no game begin/over signals to tell the model it was "teleported" in a new environment.
    Or if somehow its inner state is influenced by more time steps than the current one. In learning mode specially this may create confusion.
    222464
    @222464
    @Blimpyway historyCapacity is purely for credit assignment for RL, which only happens at the bottom-most layer
    historyCapacity does not affect the working memory of the hierarchy, it is purely to speed up RL with credit assignment. Some branches use eligibility traces instead of a short buffer
    Blimpyway
    @Blimpyway
    Thanks a lot.
    Adam
    @alienatorZ
    Can actions be continuous in pyaogmaneo?
    222464
    @222464
    @alienatorZ not directly, actions come in the form of a CSDR and can be decoded into continuous actions
    Adam
    @alienatorZ
    Right but you don’t see a problem using a set of csdrs as the action then decoding to scalars
    222464
    @222464
    @alienatorZ no, you can approximate with discrete values quite easily
    depending on how you do it, the precision goes up exponential w.r.t. the number of columns
    you can encode and decode float32 values if you want directly from the bit representation
    for example, every 4 bits is a column index in the range [0, 15]
    Adam
    @alienatorZ
    That was my second question. Does ogmaneo learn if the csdr represents binary or does it need a lower delta between states?
    222464
    @222464
    Local sensitivity is needed, but if you take 4 consecutive bits at a time from float32 or int32, it will lead to a locally sensitive representation naturally
    you can also train a vector quantizer of course, but the float32->CSDR trick is still quite handy
    as it is simple
    Adam
    @alienatorZ
    Why would pyaogmaneo settle to giving the same prediction after 1000s of steps even though there is a large negative reward?
    222464
    @222464
    @alienatorZ in what environment? Also, is there variation in the reward?
    Adam
    @alienatorZ
    I think I figured out the problem was not enough exploration
    Adam
    @alienatorZ
    Is the way to put multiple inputs to pack them into one big csdr or multiple csdrs? I tried to put multiple prediction descriptors in and one action descriptor and I received a segfault
    222464
    @222464
    @alienatorZ make sure you are accessing the correct input index when getting the actions
    if that's not the problem, can you share the code somewhere?
    might also be that the input CSDRs are the wrong size
    there will be an update soon that includes better runtime checking in the Python interface
    Adam
    @alienatorZ
    h.initRandom([ pyaon.IODesc((1, 8, 16), pyaon.prediction, eRadius=1, dRadius=1), pyaon.IODesc((1, 8, 16), pyaon.prediction, eRadius=1, dRadius=1), pyaon.IODesc((1, 1, 3), pyaon.action, eRadius=0, dRadius=1, historyCapacity=64) ], lds)
    #h.initFromFile("solstrat" + "-save5")
    # Set some parameters for the actor IO layer (index 1)
    h.setAVLR(1, 0.005)
    h.setAALR(1, 0.005)
    h.setADiscount(1, 0.99)
    h.setAHistoryIters(1, 64)
    This is the code that is failing
    the segfault happens at h.setAVLR(1, 0.005)
    222464
    @222464
    the index should be 2, not 1 I think
    0 and 1 are both predictions, 2 is actions
    Adam
    @alienatorZ
    Ahhh Yes! That’s what the 1 is for🤦‍♂️ Thanks!
    222464
    @222464
    no problem!
    Gershom
    @llucid-97
    Hey there, quick question: is it possible to make the agent take actions without the boltman exploration policy (say for a kind of "evaluation mode")
    So greedy actions I mean
    222464
    @222464
    @llucid-97 not at the moment, we can add it though!
    Gershom
    @llucid-97
    cool :) I think that'd be really useful