Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 29 23:31

    dependabot[bot] on pip

    Bump mistune from 0.8.4 to 2.0.… (compare)

  • May 24 17:30

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

Alexander Kuhnle
@AlexKuhnle
...ed to address a problem when evaluating agents with internal states (using an RNN).
When evaluating, one only calls agent.act with the independent/evaluation flag set. In this case, one is now additionally required to provide an internals argument (and in return gets an additional result, the next internal states). On the first call in an episode, this is supposed to be the result of initial_internals, afterwards just the return of the previous act call. See for instance the evaluation example in the docs under getting started. I will add a bit more documentation as well.
(this can be ignored if no internal states are used)
Alexander Kuhnle
@AlexKuhnle
does this clarify the question?
Qiao.Zhang
@qZhang88
great! thanks a lot
IbraheemNofal
@IbraheemNofal

Hello,
so I'm currently attempting to get a DQN agent to work for my current solution and I'm finding a few things not entirely clear, so I have a couple of questions + an error that I'm getting.

The questions:

1) Does the DQN agent automatically update the weights at the end of each episode or do I have to manually call the Update() method?

2) Does the agent automatically store the state, action, reward it's given so we it can use that to train afterwards, or do I have to manually do it by storing them in a memory module and then use that for training?

The error I'm getting:

As for the error I'm getting, it's the following:

InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (agent.observe/strided_slice:0) = ] [407] [y (agent.observe/strided_slice_1:0) = ] [0]
[[node agent.observe/assert_equal_1/Assert/AssertGuard/Assert (defined at F:\ProgramFiles\Anaconda3\envs\Tensorforce\lib\site-packages\tensorforce\core\models\model.py:1094) ]]
[[{{node GroupCrossDeviceControlEdges_0/agent.observe/agent.core_observe/agent.core_experience/estimator.enqueue/assert_equal/Assert/AssertGuard/Assert/data_4}}]]

Opening the model.py file, the error seems to occur at the following stage:

size of terminal equals buffer index

        tf.debugging.assert_equal(
            x=tf.shape(input=terminal, out_type=tf.int64)[0],
            y=tf.dtypes.cast(x=self.buffer_index[parallel], dtype=tf.int64)
        ),
please note, the issue occurs when I pass a terminal state = true to the agent
Qiao.Zhang
@qZhang88
@IbraheemNofal Could you paste your code?
Alexander Kuhnle
@AlexKuhnle
Hi @IbraheemNofal :
1) DQN as every other agent updates automatically, the update(...) function doesn't usually need to be called. You can specify how frequently the update should happen via the update_frequency argument, or implicitly via batch_size (if update_frequency is None, then update_frequency = batch_size). These numbers are timestep-based, so independent of episodes (since DQN is generally largely agnostic to episodes).
2) The agent automatically stores its experiences, as long as act(...) and observe(...) are called iteratively (or Runner is used, which takes care of it). No need to take care of anything here.
Regarding your exception, is it possible that you call observe(...) only when you encounter a terminal state? As @qZhang88 mentioned, it would be good to see the code and how you call act() and observe().
IbraheemNofal
@IbraheemNofal
So I've spent some time tinkering around with it, and it turns out that due to an indentation fault of mine, I've been calling dqn_agent.act() once more per episode than dqn_agent.observe(). Fixed that so they're called an equal number of times, with act coming before observe always, but the error still persists. Below is the code. Please note that I'm not using an environment or a runner, as I'm handling everything manually due to it working better for my solution.

\\
def run_And_Update_States(self):

    #this method is responsible for running a one step iteration and updating the states
    #a one step iteration can be every X int amount of simulation steps, depending on how often this method is called





    #note: the way this works is by taking one action per dqn_agent per timestep, which is necessary
    #as I'm running multiple agents within the same environment, then executing the action and then updating
    #the reward through observe the next time step. To do so, it's important to distinguish between the first step and every other step. It isn't possible to return the
    #reward immediately from the environment for the current action before at least executing one simulation step, this is because we have to wait for the other agents to take
    #their actions as well

    reward = 0

    #update queues and variables
    self.update_TLS_Queues()


    if  self.previous_State is None and self.current_State is None:
        #first call of this method --> first step
        print("***First step***")
        self.current_State = self.Get_State()
        self.current_Action = self.choose_action(self.current_State)
        self.action_changed = True
        self.action_counter += 1
    else:
        #not the first time this method is called, i.e. we've already taken at least 1 action --> we can update memory + accumulate reward for previously taken action



        #update previous state and current state
        self.previous_State = self.current_State
        self.current_State = self.Get_State()
        self.previous_Action = self.current_Action #the previously taken action is now stored in its own varibale, so we can correlate between state, action, next state and reward



        #retrieve and save info about terminal state
        terminal = False
        if (traci.simulation.getMinExpectedNumber() == 0):
            terminal = True
            print("***Terminal state reached, ending episode for "+ self.TLS_ID)

        if(self.ack_Count != 0):
            # acknowledgements since last timestep

            avg_Travel_Time = float (avg_Travel_Time) / float(self.ack_Count)
            print("Avg travel time for %s is %d" %(self.TLS_ID,avg_Travel_Time ))
            reward = self.Evaluate_Reward(avg_Travel_Time, self.ack_Count)

        else:
            #no acknowledgements since last timestep

            reward = self.Evaluate_Reward(1,0)
            self.Total_reward += reward


        print("Action_Counter = %d & Observe_Counter = %d" %(self.action_counter, self.observe_counter))

        #pass info about terminal state to agent, 0 reward + true on terminal state
        update_bool = self._model.DQN_Agent.observe(reward = reward, terminal = terminal)
        self.observe_counter += 1

        if(update_bool):
            #print when an update occurs
            print("Model with TLS ID # "+ self.TLS_ID + "was updated at timestep = %d" + self.step )
        self.Total_reward += reward

    #take action
        if not(traci.simulation.getMinExpectedNumber() == 0):
            self.current_Action = self.choose_action(self.current_State) #action to take in this timestep
            self.action_counter += 1

    #the change in phase is set from inside the run() method so we can keep track of the number of steps spent in the yellow phase before switching
        if(self.current_Action == self.previous_Action):
            self.action_changed = False


        else:
            self.action_changed = True

    self._steps +=1
    print(self.TLS_ID)
    print("**Previous action:")
    print(self.previous_Action)
    print("**Current action:")
    print(self.current_Action)
    print("**Action changed bool:")
    print(self.action_changed)

\\

Alexander Kuhnle
@AlexKuhnle
@bob7827 , since you were asking a related question a few days ago: there is now the possibility to save the model variables in other formats (numpy, hdf5) or get/assign individual variables. If you're interested in that, have a look at the most recent agent doc and let me know if you have questions.
Alexander Kuhnle
@AlexKuhnle
@IbraheemNofal I can't spot an obvious problem, however, you mention in the comments that you're controlling multiple agents or something, which may mess things up. What does the choose_action(...) method look like, and how do these multiple agents exactly work? Also, it sounds like there may be a better way of doing this. Feel free to write me a private message and we can discuss it in more detail.
B Walker
@bob7827
@AlexKuhnle , thanks for the update. That was fast!
Alexander Kuhnle
@AlexKuhnle
As I said, it was next on the roadmap anyway ;-)
Jeff Willert
@jeffreywillert
@AlexKuhnle - Is there a complete example for using the new parallel execution capability? I've made it work but it took some tinkering. A straight-forward "parallel quickstart" guide would be greatly beneficial.
Alexander Kuhnle
@AlexKuhnle
@jeffreywillert , sorry to hear you struggled using it. Have you used the Runner, and if so, was it not clear how to use it for parallel execution? Or have you tried to use the slightly more low-level interface via the parallel argument of agent.act/observe? I can certainly add more information, but it would also be very welcome if you would consider contributing a short guide... :-) Also, I'm happy to help if there are still questions...
Jeff Willert
@jeffreywillert
@AlexKuhnle - I did use runner (I modified the tensorforce PPO CartPole example. I'm happy to share what I did if you wouldn't mind taking a quick look to see if matches your envisioned usage. How would be the best way for me to share my script with you?
Alexander Kuhnle
@AlexKuhnle
Yes, more than happy to... If it's a small script, you can just copy-paste it in a private message in Gitter?
Well, you can even do that if it's not that small :-D
Jeff Willert
@jeffreywillert
@AlexKuhnle attaching the script here...
Alexander Kuhnle
@AlexKuhnle
@jeffreywillert That looks right. You can also pass the agent config as dict to the runner, in which case you don't have to specify the parallel_interactions argument, but it should be automatically set internally. Note that currently you're running 16 environments, which run locally and hence will be executed iteratively, and the agent call will be batched, i.e. "in parallel". For computationally more expensive environments, it makes sense to use the remote argument (see here) to execute remotely and hence "fully in parallel".
Marius
@MariusHolm

@AlexKuhnle I was working on updating my code from tensorforce 0.5.0 to 0.5.3, but figured that as parallel environments now have been added I would try to update the code all the way to the latest github version.

I have a custom environment I want to run on multiple CPUs (locally) as my environments include a bunch of fluid mechanical simulations which are very computationally heavy. I borrowed the script Jeff linked above and tried to use Remote="multiprocessing" and changed the environment to use my custom Class, also adding Remote="multiprocessing" to the Env.create() call. This seems to work ok, and the enviroment type is recognized as a MultiprocessingEnvironment.

However, when the code reaches the Runner call, I get an assertion error:

File "/home/fenics/local/tensorforce/tensorforce/execution/runner.py", line 99, in __init__
    assert not isinstance(environment, Environment)
AssertionError

Is something going wrong with how I'm creating my environment, or is this something in the Runner class not taking into account that my environment now is of a different type?

Marius
@MariusHolm

Seems like my editing timed out, so I continue here.

I tried editing the script Jeff linked by only adding remote="multiprocessing" to Env.create() and Runner(), which seems to work, except for slowing down over time and when reaching the final episode seems like nothing is happening, and won't finish the run.

I suspect I might have misunderstood "multiprocessing" vs "socket-client", and that what I actually need to use is "socket-client". (I have used the code contributed by Jerab29 with TensorForce 0.5.0, which used the same naming convention with Client, Server, Socket etc. which is causing some suspicions.)

Alexander Kuhnle
@AlexKuhnle
The environment isinstance check should pass, as MultiprocessingEnvironment is of type Environment, so I'm surprised this fails. Could you hack in a print statement to check what the type is at that point? (If the exception still comes up...)
Multiprocessing is correct if you want to run your environment instances all locally, whereas sockets you would use if they run on different machines.
Regarding the slowdown, is it possible that this is just due to the agent getting better and hence episodes taking longer? However, the observation that the script doesn't terminate is worrying, and may be a case which I haven't come across when testing the runner code (running the environments a/sync in parallel and at the same time terminating once the episode number is reached may lead to an unexpected infinite loop...). Maybe best to post your version of the script, so I can re-run it?
Marius
@MariusHolm
Marius
@MariusHolm
Note: I changed Runner(environment='CartPole-v1') to Runner(environment=environment) which causes the same ExceptionError as for a custom env.
Printing type of environment in the line above Assertion gives <class 'tensorforce.environments.multiprocessing_environment.MultiprocessingEnvironment'> which seems right.
Alexander Kuhnle
@AlexKuhnle
I will only be able to run it later, but one thing: You should either pass a specification dict to the runner (preferred), so that environments can be created according to the remote arguments, or you have to pass the appropriate number of Environment objects to environments= and in that case don't need to specify the remote arguments. (Same, if you pass the agent spec dict, you don't need to set parallel_interactions, as it will be automatically set based on the runner arguments.)
Maybe this should be clearer in the docs
Marius
@MariusHolm

Thanks, will give that a try. I just ran a quick test of the code presented in the docs

Runner(
agent='benchmarks/configs/ppo1.json', environment='CartPole-v1',
num_parallel=5, remote='multiprocessing'
)
runner.run(num_episodes=100)

which ran fine..

Marius
@MariusHolm

I was able to figure out what I was doing wrong with my custom Environment, and probably a similar reason for the Assertion error above.

I was doing env = Environment.create() and then passing envto the Agent and the Runner. However, "multiprocessing" requires the environment we send to Runner NOT be of type "Environment, MultiprocessingEnvironment" or similar.

When I pass the custom Environment Class directly to the runner (and the agent), Agent.create(environment='CustomClass')and Runner(environment='CustomClass/CartPole-v1') tensorforce calls Environment.create() on its own.

I was calling Environment.create()on an instance which already had gone through Environment.create(), sort of double stack of Environment.create.

I'm meeting a few other errors, but they are very much more likely to be issues with how I'm defining my environment and not necessarily caused by tensorforce, so I'll take a closer look myself on those before bothering you again.

Alexander Kuhnle
@AlexKuhnle
I see, probably the documentation could be improved as well (if you have some tips, let me know :-). Generally, the preferred way is to pass everything as spec dict, so that Tensorforce can internally initialize and auto-check/set additional arguments correctly. This is a recent shift (so older code tends to initialize objects separately), as it was often not straightforward for a user to see how for different objects certain attributes had to match. Hope that in the longer run this makes it easier... :-)
Le o
@ChipChap1_gitlab
Hi everyone
Would somebody know if it is possible to train a Deep Q-Network incrementally in an online setting where new observations come in a stream with a time delay? Concretely, I am looking for something like the "train_on_batch"-function in Keras (in the online case with batch =1)
Alexander Kuhnle
@AlexKuhnle
This is possible, although untypical, at least in a deep RL context. I assume you mean that at every timestep an online update is performed? (observations always come as an incremental stream in standard RL :-)
Le o
@ChipChap1_gitlab
@AlexKuhnle : Thank you very much for your response. Yes, the goal would be to have an online update. How can this be realised in tensorforce?
Qiao.Zhang
@qZhang88
@AlexKuhnle Hi, found this parameter while creating most kind of agents, what is update_frequency is used for? have checked frequency in code, but didn't find it be used. Thanks for answering my question?
update_frequency ("never" | parameter, long > 0) – Frequency of updates (default: batch_size).
Qiao.Zhang
@qZhang88
@AlexKuhnle saw your answer to another question above, I am curious about this, could update_frequency not equal to batch_size, I think when data collect enough, update is called? I am using PPO in tensorforce, update_frequency should not be timestep_based right?
1) DQN as every other agent updates automatically, the update(...) function doesn't usually need to be called. You can specify how frequently the update should happen via the update_frequency argument, or implicitly via batch_size (if update_frequency is None, then update_frequency = batch_size). These numbers are timestep-based, so independent of episodes (since DQN is generally largely agnostic to episodes).
Alexander Kuhnle
@AlexKuhnle
Hi @qZhang88 , update_frequency always has the same unit as batch_size, all specified as part of update (in TensorforceAgent). So in case of PPO it can't be timestep-based. As you've probably read, update_frequency specifies how frequently an update is scheduled -- > batch_size doesn't make sense, otherwise some experience would just be ignored, = batch_size is the default, but it makes sense to experiment with "increasing" the periodicity / "decreasing" the frequency < batch_size.
@ChipChap1_gitlab , a simple way is just to choose a very small memory size, but better is to use the more general TensorforceAgent, replicate DQN's internal configuration and change the bits that you'd like to adapt. In this case I think the only change necessary is memory = dict(type='recent') instead of DQN's replay and custom capacity.
Qiao.Zhang
@qZhang88
@AlexKuhnle Thanks, one more question, we are doing a plane flight game, and the Env could take multiple actins at same time, like change of pitch, yaw, roll and boost. pitch, yaw and roll each have 3 choice, 1, -1 and 0, but boost has only 1 or 0. So does TensorForce support mix actions? how should i create a PPO agent for that? should action param be dict(type=int, shape=[3,3,3,2], num_values=??)
Alexander Kuhnle
@AlexKuhnle
Instead of using a single-action dict a la dict(type=..., shape=...), in general you can specify a nested action dict like dict(action1=dict(type=..., shape=...), action2=dict(type=..., shape=...), ...). Your environment (if you implement the Environment class) can just return this for actions(), and/or your agent can receive this as actions argument.
Qiao.Zhang
@qZhang88

Hi @qZhang88, hope the following explanation clarifies your question: PPO, as many other standard policy gradient algorithms, uses complete rollouts (episodes) for reward estimation. In Tensorforce this means that batch_size defines the number of episodes (each consisting of many timesteps) per update batch. Moreover, the way the PPO update works according to the paper is that it actually performs multiple updates based on randomly subsampled timestep-minibatches (the entire batch of n episodes is quite big). So the subsampling_fraction specifies what fraction of the full batch is subsampled for each minibatch, and optimization_steps specifies how often these mini-updates should happen.

still have some questions here, let's say batch size 10, max timestep is 1000, subsampling_fraction is 0.2, so each update batch size is still 10 and timesteps would be less than 200, right? the optimization steps could be increased, to take full advantage of the whole episodes?

would the update be parallel or the sampling process will wait till update finished?
Alexander Kuhnle
@AlexKuhnle
Batch-size 10 and max-timesteps 1k means overall up to 10k timesteps per update. PPO subsampling-fraction 0.2 means it subsamples 2k from this batch per "mini-update", and repeats this optimization-steps times. In particular, it doesn't make sense to choose subsampling-fraction and optimization-steps, such that their product is < 1, as then you wouldn't make use of the full batch, so you may as well decrease batch-size and increase the other parameters accordingly (batch-size 10, fraction 0.2, steps 3 is equivalent to batch-size 6, fraction 0.33, steps 3, but less memory-consuming).
The update is not in parallel and generally not optimized in a special way. It would be possible to do a few things there, and I think this was part of the motivation for this design in PPO (not sure), but Tensorforce doesn't do that currently (and it's not really primary focus in general, although I would be curious to investigate what could be done, if I had time :-).