## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
gonond
@gonond
still too large penalty, i try r=-2
Bharat Tak
@devbharat
ohk
gonond
@gonond
yeah, it worked!
Bharat Tak
@devbharat
great! mine is hardly moving...
gonond
@gonond
ok, I guess we are done with the coding part
Bharat Tak
@devbharat
yaa
gonond
@gonond
well the provided solution is still better, it is faster
but I think we can leave it like this
Bharat Tak
@devbharat
can you mail me the zip code, maybe my other files are not correct...i still dont see why is only slightly ossilating with my implementation
gonond
@gonond
ok
Bharat Tak
@devbharat
thanks!
gonond
@gonond
so just the 2 design files rigtht?
Bharat Tak
@devbharat
yes
gonond
@gonond
there you go
Bharat Tak
@devbharat
something seems off, it is oscillating very slowly, are you sure you mailed the correct file
gonond
@gonond
Maybe the parameters in your main are not the same as mine... i used 20,20 state bins and 5 action bins and 20 modeling iterations
the files should be the correct ones
Bharat Tak
@devbharat
ah yes
it get very close,
almost
gonond
@gonond
doesnt it reach the goal?
Bharat Tak
@devbharat
i did not in that try, but it was very close,
there is some randomness, so maybe we are at an edge case
gonond
@gonond
ok funny... probably some fine tuning necessary
gonond
@gonond
hey, I would like to submit soon, I will post my answers, so we can discuss them ok?

Q1: The stochastic element is the difference between the actual state and the nearest bin center, i.e. the discretization error. Since in the Transition Probability Matrix all the states within the same bin are represented by the same discrete state one has to give a certain probability to different state transitions from a given discrete state and action although the dynamics are deterministic.
This becomes even more important when the number of used bins is decreased.

Q2: The optimal solution is to apply a periodic control input with its frequency equal to the resonance frequency of the system, which can be seen as an excited oscillator. The algorithm is indeed able to find this solution.

Q3: When using the deterministic discretized model the probability matrix does not reflect reality in the sense that it treats certain state transitions as impossible, while they actually are possible. A way to overcome this problem could be to use a very fine discretization, but then the computational costs would become very high. However with fewer bins, the algorithm is unable to find a solution.

Q4: Epsilon trades of exploration vs. exploitation of the current policy. Increasing epsilon leads to much larger and more frequent variations of the accumulated reward obtained in subsequent episodes. Furthermore a too large epsilon (say 0.45) makes it impossible to reach the goal.

Q5: The final solution does not change.

Q6: The Q-Learning algorithm is able to find the shortest and therefore optimal path and it even takes less time for doing so compared to the Monte Carlo algorithm. The difference in speed is due to the fact that in Q-Learning only local information is used to update Q, whereas for Monte Carlo the rewards of a full episode have to be “back-propagated”. M.C. can also not find the optimal solution because it is an on-policy method and walking close to the cliff is risky, which makes it difficult to learn it.

Bharat Tak
@devbharat
Yes
I am still running modified versions of your code :P
Q1: what parameters have most effect ?
gonond
@gonond
number of bins used
Bharat Tak
@devbharat
I think the answers are good to go
gonond
@gonond
ok cool
so is it ok with you if i submit all the stuff?
Bharat Tak
@devbharat
Yes! If some other code hack works better, maybe we can talk about it in viva,
But it takes a lot of time to run now..so uploading is ok
gonond
@gonond
ok cool
gonond
@gonond
so see you on friday !
Bharat Tak
@devbharat
CooL! Thanks!
gonond
@gonond
short reminder: the interview is in LEE K223 at 4.10 pm
Bharat Tak
@devbharat
Oh! Thanks!
gonond
@gonond
no problem
Bharat Tak
@devbharat
Can't come, all the best!