Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    gonond
    @gonond
    my program just finished execution, the result goes into the right direction it just doesnt bounce as high as before anymore but it doesnt reach the right end, probably i have to decrease the penalty
    Bharat Tak
    @devbharat
    yes
    that might work
    gonond
    @gonond
    I mean your idea is worth a try aswell
    just started it again with penalty r=-5
    Bharat Tak
    @devbharat
    it running
    gonond
    @gonond
    ok
    still too large penalty, i try r=-2
    Bharat Tak
    @devbharat
    ohk
    gonond
    @gonond
    yeah, it worked!
    Bharat Tak
    @devbharat
    great! mine is hardly moving...
    gonond
    @gonond
    ok, I guess we are done with the coding part
    Bharat Tak
    @devbharat
    yaa
    gonond
    @gonond
    well the provided solution is still better, it is faster
    but I think we can leave it like this
    Bharat Tak
    @devbharat
    can you mail me the zip code, maybe my other files are not correct...i still dont see why is only slightly ossilating with my implementation
    gonond
    @gonond
    ok
    Bharat Tak
    @devbharat
    thanks!
    gonond
    @gonond
    so just the 2 design files rigtht?
    Bharat Tak
    @devbharat
    yes
    gonond
    @gonond
    there you go
    Bharat Tak
    @devbharat
    something seems off, it is oscillating very slowly, are you sure you mailed the correct file
    gonond
    @gonond
    Maybe the parameters in your main are not the same as mine... i used 20,20 state bins and 5 action bins and 20 modeling iterations
    the files should be the correct ones
    Bharat Tak
    @devbharat
    ah yes
    it get very close,
    almost
    gonond
    @gonond
    doesnt it reach the goal?
    Bharat Tak
    @devbharat
    i did not in that try, but it was very close,
    there is some randomness, so maybe we are at an edge case
    gonond
    @gonond
    ok funny... probably some fine tuning necessary
    gonond
    @gonond
    hey, I would like to submit soon, I will post my answers, so we can discuss them ok?

    Q1: The stochastic element is the difference between the actual state and the nearest bin center, i.e. the discretization error. Since in the Transition Probability Matrix all the states within the same bin are represented by the same discrete state one has to give a certain probability to different state transitions from a given discrete state and action although the dynamics are deterministic.
    This becomes even more important when the number of used bins is decreased.

    Q2: The optimal solution is to apply a periodic control input with its frequency equal to the resonance frequency of the system, which can be seen as an excited oscillator. The algorithm is indeed able to find this solution.

    Q3: When using the deterministic discretized model the probability matrix does not reflect reality in the sense that it treats certain state transitions as impossible, while they actually are possible. A way to overcome this problem could be to use a very fine discretization, but then the computational costs would become very high. However with fewer bins, the algorithm is unable to find a solution.

    Q4: Epsilon trades of exploration vs. exploitation of the current policy. Increasing epsilon leads to much larger and more frequent variations of the accumulated reward obtained in subsequent episodes. Furthermore a too large epsilon (say 0.45) makes it impossible to reach the goal.

    Q5: The final solution does not change.

    Q6: The Q-Learning algorithm is able to find the shortest and therefore optimal path and it even takes less time for doing so compared to the Monte Carlo algorithm. The difference in speed is due to the fact that in Q-Learning only local information is used to update Q, whereas for Monte Carlo the rewards of a full episode have to be “back-propagated”. M.C. can also not find the optimal solution because it is an on-policy method and walking close to the cliff is risky, which makes it difficult to learn it.

    Bharat Tak
    @devbharat
    Yes
    I am still running modified versions of your code :P
    Q1: what parameters have most effect ?
    gonond
    @gonond
    number of bins used
    Bharat Tak
    @devbharat
    I think the answers are good to go
    gonond
    @gonond
    ok cool
    so is it ok with you if i submit all the stuff?
    Bharat Tak
    @devbharat
    Yes! If some other code hack works better, maybe we can talk about it in viva,
    But it takes a lot of time to run now..so uploading is ok
    gonond
    @gonond
    ok cool
    gonond
    @gonond
    ok upload succesful
    so see you on friday !
    Bharat Tak
    @devbharat
    CooL! Thanks!
    gonond
    @gonond
    short reminder: the interview is in LEE K223 at 4.10 pm
    Bharat Tak
    @devbharat
    Oh! Thanks!
    gonond
    @gonond
    no problem
    Bharat Tak
    @devbharat
    Can't come, all the best!