## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Bharat Tak
@devbharat
doesnt that seem resinable ?
*resonable
gonond
@gonond
like this you are kind of drifting away from the state which you actually want to update in the probability matrix
I am just running the code with including a penalty for reaching the left end:
if (p1==-1.2)
r=-10;
end
Bharat Tak
@devbharat
that is true
lets see
gonond
@gonond
but it takes probably about 7 minutes^^
Bharat Tak
@devbharat
here too
gonond
@gonond
damn
Bharat Tak
@devbharat
?
gonond
@gonond
ah nothing, just annoying exercise
Bharat Tak
@devbharat
oh! I thought you had a eureka moment
gonond
@gonond
:) no...
Bharat Tak
@devbharat
regarding what you said about drifting
 % Convert to index of successor state (p1, v1)
s = state_c2d([p0; v0]);
sp = state_c2d([p1; v1]);
        % Update the model with the iteration's simulation results
% Count how many times sp is reached from s

Task.R_s_a(s,a) = Task.R_s_a(s,a) + r/i;
now i update s and sp!
this should be ok right ?
%% Step 2: Generate the discrete state/action space MDP model
for a = Task.A   % loop over the actions
fprintf('Discrete system model for action a = %6.4f \n', U(a));

for s = Task.S  % loop over states
p1 = X(1,s);
v1 = X(2,s);
for i = 1:Parameters.modeling_iter % loop over modeling iterations
p0 = p1 + ((rand - 0.5))*delta_x;%(i-1)*delta_x/(Parameters.modeling_iter) - 0.5*delta_x;   % position
v0 = v1 + ((rand - 0.5))*delta_v;%(i-1)*delta_v/(Parameters.modeling_iter) - 0.5*delta_v;   % velocity
action = U(:,a); % inputs

%Simulate for one time step. This function inputs and returns
%states expressed by their physical continuous values. You may
%want to use the included state_*2* functions provided to do
%this conversion.
[p1,v1,r,isTerminalState] = Mountain_Car_Single_Step(p0,v0,action); % Note: isTerminalState is nowhere needed in this scope

% Convert to index of successor state (p1, v1)
si = state_c2d([p0; v0]);
sp = state_c2d([p1; v1]);

% Update the model with the iteration's simulation results
% Count how many times sp is reached from s

end % modeling_iter
end
end
gonond
@gonond
I meant you drift away because you dont consider s as the current state anymore when you use the output of the simulation sp as the next s
Bharat Tak
@devbharat
yes, but i update the matrix correctly
gonond
@gonond
ah i see
Bharat Tak
@devbharat
you think it makes sense
?
gonond
@gonond
my program just finished execution, the result goes into the right direction it just doesnt bounce as high as before anymore but it doesnt reach the right end, probably i have to decrease the penalty
Bharat Tak
@devbharat
yes
that might work
gonond
@gonond
I mean your idea is worth a try aswell
just started it again with penalty r=-5
Bharat Tak
@devbharat
it running
gonond
@gonond
ok
still too large penalty, i try r=-2
Bharat Tak
@devbharat
ohk
gonond
@gonond
yeah, it worked!
Bharat Tak
@devbharat
great! mine is hardly moving...
gonond
@gonond
ok, I guess we are done with the coding part
Bharat Tak
@devbharat
yaa
gonond
@gonond
well the provided solution is still better, it is faster
but I think we can leave it like this
Bharat Tak
@devbharat
can you mail me the zip code, maybe my other files are not correct...i still dont see why is only slightly ossilating with my implementation
gonond
@gonond
ok
Bharat Tak
@devbharat
thanks!
gonond
@gonond
so just the 2 design files rigtht?
Bharat Tak
@devbharat
yes
gonond
@gonond
there you go
Bharat Tak
@devbharat
something seems off, it is oscillating very slowly, are you sure you mailed the correct file
gonond
@gonond
Maybe the parameters in your main are not the same as mine... i used 20,20 state bins and 5 action bins and 20 modeling iterations
the files should be the correct ones
Bharat Tak
@devbharat
ah yes
it get very close,
almost