AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
agent.save(...)
and Agent.load(...)
(see docs). Is that what you're looking for, or what more specifically do you mean by "building the model using tf or tf-lite after training"?
observe
, no matter if I run on GPU or CPU. I am positive GPU is really run on GPU as can be seen nvidia-smi
. GPUs are 2x Tesla K80 on a cloud machine. Any hints what the problem might be?
Hello,
i wonder, is this also the right place to ask questions about tf-agents? If not, please ignore the following:
I am trying to run a simple dqn agent over an environment that has a collection of different specs as observation_spec
. This means that my QNetwork
wants a preprocessing_combiner
, so i followed its documentation and gave it tf.keras.layers.Concatenate()
. Concatenate
does not like to concatenate tuples, so the observation_spec
has to be a list. Next, i want to implement a replay_buffer, specifically a TFUniformReplayBuffer
. Converting it to a dataset (as done in https://github.com/tensorflow/agents/blob/master/docs/tutorials/1_dqn_tutorial.ipynb), i get an error and the explicit instruction to change my specs from lists to tuples. However, then i run into the problem that Concatenate
doesn't like it's input. Am i doing something conceptually wrong here?
remote
argument of Environment.create
(either based on multiprocessing
or socket
) and the extended features of Runner
/ run.py
, which was basically merged with the functionality of ParallelRunner
(which itself was removed).
batch_size
defines the number of episodes (each consisting of many timesteps) per update batch. Moreover, the way the PPO update works according to the paper is that it actually performs multiple updates based on randomly subsampled timestep-minibatches (the entire batch of n episodes is quite big). So the subsampling_fraction
specifies what fraction of the full batch is subsampled for each minibatch, and optimization_steps
specifies how often these mini-updates should happen.
Hello,
so I'm currently attempting to get a DQN agent to work for my current solution and I'm finding a few things not entirely clear, so I have a couple of questions + an error that I'm getting.
The questions:
1) Does the DQN agent automatically update the weights at the end of each episode or do I have to manually call the Update() method?
2) Does the agent automatically store the state, action, reward it's given so we it can use that to train afterwards, or do I have to manually do it by storing them in a memory module and then use that for training?
The error I'm getting:
As for the error I'm getting, it's the following:
InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (agent.observe/strided_slice:0) = ] [407] [y (agent.observe/strided_slice_1:0) = ] [0]
[[node agent.observe/assert_equal_1/Assert/AssertGuard/Assert (defined at F:\ProgramFiles\Anaconda3\envs\Tensorforce\lib\site-packages\tensorforce\core\models\model.py:1094) ]]
[[{{node GroupCrossDeviceControlEdges_0/agent.observe/agent.core_observe/agent.core_experience/estimator.enqueue/assert_equal/Assert/AssertGuard/Assert/data_4}}]]
tf.debugging.assert_equal(
x=tf.shape(input=terminal, out_type=tf.int64)[0],
y=tf.dtypes.cast(x=self.buffer_index[parallel], dtype=tf.int64)
),
update(...)
function doesn't usually need to be called. You can specify how frequently the update should happen via the update_frequency
argument, or implicitly via batch_size
(if update_frequency is None
, then update_frequency = batch_size
). These numbers are timestep-based, so independent of episodes (since DQN is generally largely agnostic to episodes).act(...)
and observe(...)
are called iteratively (or Runner
is used, which takes care of it). No need to take care of anything here.observe(...)
only when you encounter a terminal state? As @qZhang88 mentioned, it would be good to see the code and how you call act()
and observe()
.
\\
def run_And_Update_States(self):
#this method is responsible for running a one step iteration and updating the states
#a one step iteration can be every X int amount of simulation steps, depending on how often this method is called
#note: the way this works is by taking one action per dqn_agent per timestep, which is necessary
#as I'm running multiple agents within the same environment, then executing the action and then updating
#the reward through observe the next time step. To do so, it's important to distinguish between the first step and every other step. It isn't possible to return the
#reward immediately from the environment for the current action before at least executing one simulation step, this is because we have to wait for the other agents to take
#their actions as well
reward = 0
#update queues and variables
self.update_TLS_Queues()
if self.previous_State is None and self.current_State is None:
#first call of this method --> first step
print("***First step***")
self.current_State = self.Get_State()
self.current_Action = self.choose_action(self.current_State)
self.action_changed = True
self.action_counter += 1
else:
#not the first time this method is called, i.e. we've already taken at least 1 action --> we can update memory + accumulate reward for previously taken action
#update previous state and current state
self.previous_State = self.current_State
self.current_State = self.Get_State()
self.previous_Action = self.current_Action #the previously taken action is now stored in its own varibale, so we can correlate between state, action, next state and reward
#retrieve and save info about terminal state
terminal = False
if (traci.simulation.getMinExpectedNumber() == 0):
terminal = True
print("***Terminal state reached, ending episode for "+ self.TLS_ID)
if(self.ack_Count != 0):
# acknowledgements since last timestep
avg_Travel_Time = float (avg_Travel_Time) / float(self.ack_Count)
print("Avg travel time for %s is %d" %(self.TLS_ID,avg_Travel_Time ))
reward = self.Evaluate_Reward(avg_Travel_Time, self.ack_Count)
else:
#no acknowledgements since last timestep
reward = self.Evaluate_Reward(1,0)
self.Total_reward += reward
print("Action_Counter = %d & Observe_Counter = %d" %(self.action_counter, self.observe_counter))
#pass info about terminal state to agent, 0 reward + true on terminal state
update_bool = self._model.DQN_Agent.observe(reward = reward, terminal = terminal)
self.observe_counter += 1
if(update_bool):
#print when an update occurs
print("Model with TLS ID # "+ self.TLS_ID + "was updated at timestep = %d" + self.step )
self.Total_reward += reward
#take action
if not(traci.simulation.getMinExpectedNumber() == 0):
self.current_Action = self.choose_action(self.current_State) #action to take in this timestep
self.action_counter += 1
#the change in phase is set from inside the run() method so we can keep track of the number of steps spent in the yellow phase before switching
if(self.current_Action == self.previous_Action):
self.action_changed = False
else:
self.action_changed = True
self._steps +=1
print(self.TLS_ID)
print("**Previous action:")
print(self.previous_Action)
print("**Current action:")
print(self.current_Action)
print("**Action changed bool:")
print(self.action_changed)
\\
choose_action(...)
method look like, and how do these multiple agents exactly work? Also, it sounds like there may be a better way of doing this. Feel free to write me a private message and we can discuss it in more detail.