Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    gonond
    @gonond
    I am just running the code with including a penalty for reaching the left end:
    if (p1==-1.2)
    r=-10;
    end
    Bharat Tak
    @devbharat
    that is true
    lets see
    gonond
    @gonond
    but it takes probably about 7 minutes^^
    Bharat Tak
    @devbharat
    here too
    gonond
    @gonond
    damn
    Bharat Tak
    @devbharat
    ?
    gonond
    @gonond
    ah nothing, just annoying exercise
    Bharat Tak
    @devbharat
    oh! I thought you had a eureka moment
    gonond
    @gonond
    :) no...
    Bharat Tak
    @devbharat
    regarding what you said about drifting
    ``` % Convert to index of successor state (p1, v1)
    s = state_c2d([p0; v0]);
    sp = state_c2d([p1; v1]);
            % Update the model with the iteration's simulation results
            % Count how many times sp is reached from s
            Task.P_s_sp_a(s,sp,a)  = Task.P_s_sp_a(s,sp,a) + 1/i;
    
            Task.R_s_a(s,a) = Task.R_s_a(s,a) + r/i;```
    now i update s and sp!
    this should be ok right ?
    %% Step 2: Generate the discrete state/action space MDP model 
    for a = Task.A   % loop over the actions
        fprintf('Discrete system model for action a = %6.4f \n', U(a));
    
        for s = Task.S  % loop over states
            p1 = X(1,s);
            v1 = X(2,s);
            for i = 1:Parameters.modeling_iter % loop over modeling iterations
                p0 = p1 + ((rand - 0.5))*delta_x;%(i-1)*delta_x/(Parameters.modeling_iter) - 0.5*delta_x;   % position
                v0 = v1 + ((rand - 0.5))*delta_v;%(i-1)*delta_v/(Parameters.modeling_iter) - 0.5*delta_v;   % velocity
                action = U(:,a); % inputs
    
                %Simulate for one time step. This function inputs and returns
                %states expressed by their physical continuous values. You may
                %want to use the included state_*2* functions provided to do
                %this conversion.
                [p1,v1,r,isTerminalState] = Mountain_Car_Single_Step(p0,v0,action); % Note: isTerminalState is nowhere needed in this scope
    
                % Convert to index of successor state (p1, v1)
                si = state_c2d([p0; v0]);
                sp = state_c2d([p1; v1]);
    
                % Update the model with the iteration's simulation results
                % Count how many times sp is reached from s
                Task.P_s_sp_a(si,sp,a)  = Task.P_s_sp_a(si,sp,a) + 1/i;
    
                Task.R_s_a(si,a) = Task.R_s_a(si,a) + r/i;
            end % modeling_iter
        end        
    end
    gonond
    @gonond
    I meant you drift away because you dont consider s as the current state anymore when you use the output of the simulation sp as the next s
    Bharat Tak
    @devbharat
    yes, but i update the matrix correctly
    gonond
    @gonond
    ah i see
    Bharat Tak
    @devbharat
    you think it makes sense
    ?
    gonond
    @gonond
    my program just finished execution, the result goes into the right direction it just doesnt bounce as high as before anymore but it doesnt reach the right end, probably i have to decrease the penalty
    Bharat Tak
    @devbharat
    yes
    that might work
    gonond
    @gonond
    I mean your idea is worth a try aswell
    just started it again with penalty r=-5
    Bharat Tak
    @devbharat
    it running
    gonond
    @gonond
    ok
    still too large penalty, i try r=-2
    Bharat Tak
    @devbharat
    ohk
    gonond
    @gonond
    yeah, it worked!
    Bharat Tak
    @devbharat
    great! mine is hardly moving...
    gonond
    @gonond
    ok, I guess we are done with the coding part
    Bharat Tak
    @devbharat
    yaa
    gonond
    @gonond
    well the provided solution is still better, it is faster
    but I think we can leave it like this
    Bharat Tak
    @devbharat
    can you mail me the zip code, maybe my other files are not correct...i still dont see why is only slightly ossilating with my implementation
    gonond
    @gonond
    ok
    Bharat Tak
    @devbharat
    thanks!
    gonond
    @gonond
    so just the 2 design files rigtht?
    Bharat Tak
    @devbharat
    yes
    gonond
    @gonond
    there you go
    Bharat Tak
    @devbharat
    something seems off, it is oscillating very slowly, are you sure you mailed the correct file
    gonond
    @gonond
    Maybe the parameters in your main are not the same as mine... i used 20,20 state bins and 5 action bins and 20 modeling iterations
    the files should be the correct ones
    Bharat Tak
    @devbharat
    ah yes
    it get very close,
    almost
    gonond
    @gonond
    doesnt it reach the goal?
    Bharat Tak
    @devbharat
    i did not in that try, but it was very close,
    there is some randomness, so maybe we are at an edge case