Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Bharat Tak
    @devbharat
    yes,
    but look at the loop above it
    p0 takes p1 as mean value
    in each iteration it takes previous iteration output as its current mean value
    doesnt that seem resinable ?
    *resonable
    gonond
    @gonond
    like this you are kind of drifting away from the state which you actually want to update in the probability matrix
    I am just running the code with including a penalty for reaching the left end:
    if (p1==-1.2)
    r=-10;
    end
    Bharat Tak
    @devbharat
    that is true
    lets see
    gonond
    @gonond
    but it takes probably about 7 minutes^^
    Bharat Tak
    @devbharat
    here too
    gonond
    @gonond
    damn
    Bharat Tak
    @devbharat
    ?
    gonond
    @gonond
    ah nothing, just annoying exercise
    Bharat Tak
    @devbharat
    oh! I thought you had a eureka moment
    gonond
    @gonond
    :) no...
    Bharat Tak
    @devbharat
    regarding what you said about drifting
    ``` % Convert to index of successor state (p1, v1)
    s = state_c2d([p0; v0]);
    sp = state_c2d([p1; v1]);
            % Update the model with the iteration's simulation results
            % Count how many times sp is reached from s
            Task.P_s_sp_a(s,sp,a)  = Task.P_s_sp_a(s,sp,a) + 1/i;
    
            Task.R_s_a(s,a) = Task.R_s_a(s,a) + r/i;```
    now i update s and sp!
    this should be ok right ?
    %% Step 2: Generate the discrete state/action space MDP model 
    for a = Task.A   % loop over the actions
        fprintf('Discrete system model for action a = %6.4f \n', U(a));
    
        for s = Task.S  % loop over states
            p1 = X(1,s);
            v1 = X(2,s);
            for i = 1:Parameters.modeling_iter % loop over modeling iterations
                p0 = p1 + ((rand - 0.5))*delta_x;%(i-1)*delta_x/(Parameters.modeling_iter) - 0.5*delta_x;   % position
                v0 = v1 + ((rand - 0.5))*delta_v;%(i-1)*delta_v/(Parameters.modeling_iter) - 0.5*delta_v;   % velocity
                action = U(:,a); % inputs
    
                %Simulate for one time step. This function inputs and returns
                %states expressed by their physical continuous values. You may
                %want to use the included state_*2* functions provided to do
                %this conversion.
                [p1,v1,r,isTerminalState] = Mountain_Car_Single_Step(p0,v0,action); % Note: isTerminalState is nowhere needed in this scope
    
                % Convert to index of successor state (p1, v1)
                si = state_c2d([p0; v0]);
                sp = state_c2d([p1; v1]);
    
                % Update the model with the iteration's simulation results
                % Count how many times sp is reached from s
                Task.P_s_sp_a(si,sp,a)  = Task.P_s_sp_a(si,sp,a) + 1/i;
    
                Task.R_s_a(si,a) = Task.R_s_a(si,a) + r/i;
            end % modeling_iter
        end        
    end
    gonond
    @gonond
    I meant you drift away because you dont consider s as the current state anymore when you use the output of the simulation sp as the next s
    Bharat Tak
    @devbharat
    yes, but i update the matrix correctly
    gonond
    @gonond
    ah i see
    Bharat Tak
    @devbharat
    you think it makes sense
    ?
    gonond
    @gonond
    my program just finished execution, the result goes into the right direction it just doesnt bounce as high as before anymore but it doesnt reach the right end, probably i have to decrease the penalty
    Bharat Tak
    @devbharat
    yes
    that might work
    gonond
    @gonond
    I mean your idea is worth a try aswell
    just started it again with penalty r=-5
    Bharat Tak
    @devbharat
    it running
    gonond
    @gonond
    ok
    still too large penalty, i try r=-2
    Bharat Tak
    @devbharat
    ohk
    gonond
    @gonond
    yeah, it worked!
    Bharat Tak
    @devbharat
    great! mine is hardly moving...
    gonond
    @gonond
    ok, I guess we are done with the coding part
    Bharat Tak
    @devbharat
    yaa
    gonond
    @gonond
    well the provided solution is still better, it is faster
    but I think we can leave it like this
    Bharat Tak
    @devbharat
    can you mail me the zip code, maybe my other files are not correct...i still dont see why is only slightly ossilating with my implementation
    gonond
    @gonond
    ok
    Bharat Tak
    @devbharat
    thanks!
    gonond
    @gonond
    so just the 2 design files rigtht?
    Bharat Tak
    @devbharat
    yes
    gonond
    @gonond
    there you go
    Bharat Tak
    @devbharat
    something seems off, it is oscillating very slowly, are you sure you mailed the correct file
    gonond
    @gonond
    Maybe the parameters in your main are not the same as mine... i used 20,20 state bins and 5 action bins and 20 modeling iterations