AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
Hello
I have developed an agent to control a heating system. However, I am wondering that every time that I run my code, exactly the same code with the same parameters, I observe a different performance (sometimes very different). I was thinking maybe it can be because of exploration, in which agent is taking random actions and therefore in each run the performane is different. So I tried decaying the exploration with the "st_exp" function that I found in this channel. However, it still performs differently in each run with the same parameters.
Basically most of the times I do not get satisfactory performance metrics so I need to tune my reward function, but as far as I get a very different performance under the same reward function I can not tune it. For example the same parameters were giving me quite good performance last night but now the performance is terrible.
(1) Any suggestions on what the reason can be?
(2) Also it is not clear for me that what does "st_exp" do exactly? what is the function exactly?
(3) For "st_exp", can you please let me know what parameters you suggest to use for "decay_steps", "final_value" and "num_steps=Train_hours_number"? As the name implies, "Train_hours_number " is the number of my training hours
(4) How we can see what are the default parameters of agent? For example the default exploration or default architecture of network
st_exp = dict(type='decaying', unit='timesteps', decay='polynomial', decay_steps=100000, initial_value=1.0,
final_value=0, power=1.0,num_steps=Train_hours_number)
agent = Agent.create(
agent='dqn',
max_episode_timesteps=Train_hours_number,
environment=environment,
network=[
dict(type='dense', size=50, activation='tanh'),
dict(type='flatten'),
dict(type='dense', size=50, activation='tanh'),
dict(type='flatten', name="out"),
],
exploration=st_exp,
learning_rate=1e-3,
batch_size=72,
memory=Train_hours_number
)
Hi @AlexKuhnle , thanks for the documents. I went through the documents and now the concepts are more clear to me. However, still I have two problems: (1) I get "very" different results on different runs (2) the agent performance most of the times is very poor. I tried to see what is affecting most this unstable and poor behavior. It seems to me that the rest of the code (definition of environment, states, reward, etc) is fine and the agent definition should be a problem. For example, when I change the batch_size, the results change significantly. After reading about the batch_size, update_frequency and other parameters I tried following items:
(1) Changing the agent to ppo, double_ddq, temsorforce
(2) Comparing episode of 1 week versus 1 day
(3) Comparing different weights on Reward function componenets
(4) Adding more states
(5) Increasing learing rate
(6) Changing batch_size into 8, 12,24 (without specifying update frequency)
(7) Including and discluding exploration
Finally I have designed my agent as follow but the issue are not solved, still a poor and variating performance. Do you have any suggestions what other modifications I can try with agent?
My agent interacts with the environemnt 13 weeks, each episode is one day(24 timesteps). Then I test the train model over 2.5 weeks. The problem is not that hard, just to turn on and off a heater.
linear_decay = dict(type='decaying', unit='timesteps', decay="linear", num_steps=168, initial_value=0.99,
final_value=0.01)
agent = Agent.create(
agent='dqn',
learning_rate=1e-3,
environment=environment,
batch_size=24,
update_frequency=8,
exploration=linear_decay,
network="auto",
memory=Train_hours_number
)
Hi @GANdalf2357 , I think this issue was recently resolved, so updating to the latest Github master should help (or if you're using the pip version, changing to the Github version -- of course, sooner or later there will be a new pip version, maybe should do that soon).
Hi @AlexKuhnle I now switched to master to overcome this issue but now the reset() call failes, do have I have to change something in my state with the new master? I now get this error in reset() tensorforce.exception.TensorforceError: Environment.reset: invalid type <class 'tuple'> != float for state.
update_frequency
is not about the reward, but about the optimization step. So for each update you have a certain batch-size, say 10 episodes, or 64 timesteps, and unless specified differently, that will also be the update_frequency
. However, you may want to update more frequently than that, so you may set update_frequency
to something between 1
and batch_size
(> batch_size
doesn't really make sense).
>> batch_size
, at least 100x or much more. Moreover, it's typical particularly here to set the update_frequency
lower than batch_size
, say 8 vs 64. But it's a parameter that can be tuned, at least to get the rough magnitude right.
int
types and min_value
, max_value
, and num_value
should be specified. However if I set these three I got an error File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/utils/tensor_spec.py", line 45, in __init__
name='TensorSpec', argument='min/max_value', condition='num_values specified'
tensorforce.exception.TensorforceError: Invalid TensorSpec argument min/max_value given num_values specified.
As I trace the source code it seems like these three cannot be all set at the same time?
num_values
for x1 as 80−58+1=23 and x2 as 1500−0=1500?
min_value
and max_value
, why not let the code to calculate the num_value
itself?
min_/max_value
are for float
types as lower and upper bound, whereas num_values
is for int
types. The fact that it also specifies min_/max_value
implicitly shouldn't matter, so you shouldn't need to specify all three, just the two or one, depending on the type. (The idea to still add it is, I think, they internally share some asserts via min/max-value, plus there could potentially be layer types which want to know min/max-bounds... but it really doesn't matter right now.)
num_values
in case of int
.
num_values
without setting max_value
and min_value
and it worked, Traceback (most recent call last):
File "reinforcement_learning_continuous.py", line 221, in <module>
save_best_agent=True
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 548, in run
self.handle_observe(parallel=n)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 659, in handle_observe
terminal=self.terminals[parallel], reward=self.rewards[parallel], parallel=parallel
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 511, in observe
terminal=terminal_tensor, reward=reward_tensor, parallel=parallel_tensor
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/module.py", line 128, in decorated
output_args = function_graphs[str(graph_params)](*graph_args)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 784, in __call__
result = self._call(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 818, in _call
results = self._stateful_fn(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2972, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1948, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in call
ctx=ctx)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
**tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid gradient: contains inf or nan. : Tensor had NaN values**
[[{{node agent/StatefulPartitionedCall/agent/cond_1/then/_311/agent/cond_1/StatefulPartitionedCall/agent/StatefulPartitionedCall_7/policy_optimizer/StatefulPartitionedCall/policy_optimizer/VerifyFinite/CheckNumerics}}]] [Op:__inference_observe_5452]
Function call stack:
observe
@AlexKuhnle I got this TypeError TypeError: __init__() got multiple values for keyword argument 'optimizer'
while setting up A2C like this:
elif args.agent=="a2c":
agent = Agent.create(
agent='a2c',
environment=environment,
max_episode_timesteps=5,
batch_size=32,
network=[
dict(type='dense', size=128, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
optimizer=dict(
optimizer='adam', learning_rate=1e-3, clipping_threshold=1e-2,
multi_step=10, subsampling_fraction=64, linesearch_iterations=5,
doublecheck_update=True
),
critic=[
dict(type='dense', size=64, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
)
However this is what described we are supposed to be setting optimizer in the 'optimizer' section.
learning_rate
, whereas the rest of optimizer
is implicitly specified as ADAM. You would need to move to the tensorforce
agent to have all arguments available. The idea behind the various agent sub-classes is to provide a "standard" interface with only the typical arguments (and things like subsampling_fraction
, linesearch
are not typical for A2C), however, maybe I'll change this at some point.
Hi @AlexKuhnle , to your previous question regarding DQN gradient assertion, here are my responses:
To your questions, here are the reponses:
Agent config.:
agent = Agent.create(
agent="dqn",
environment=environment,
memory=300,
batch_size=32,
network="auto",
update_frequency=1,
learning_rate=1e-5,
discount=0.9
)
2.&3. Very quick, running on the first episode.
4.&5. There is only one action since it happened on the first episode.
Hi
I need to train my agent for many iterations. My agent interacts with TRNSYS as environment and calls it at each action to calculate the next state. Therefore, it takes a long time to train it (8 hours). I need to reduce this time.
1- Do you think if I perform parallel computing I will get fast enough operation?
2- I tried the following way to test if I can do parallel computing but I get the following error:
Runner(
agent=r'C:\Python_TRNSYS_integration\Parallelization test\agent.json', environment=HotWaterEnvironment,
num_parallel=4
)
runner.run(num_episodes=100, batch_agent_calls=True)
TypeError: init() got an unexpected keyword argument 'internals'
2- And if the above code gets work and shows fast enough results, I need to parallelize the following code which is the same training as above but I store some values in each uteration. Can you show me on the following code how can I parallelize it?
States_train_target=[]
Actions_train_target=[]
Rewards_train_target=[]
Energy_train_target=[]
Reward_total_train_allepisodes_target=[]
Reward_energy_train_allepisodes_target=[]
Reward_comfort_train_allepisodes_target=[]
Reward_hygiene_train_allepisodes_target=[]
Energy_train_allepisodes_target=[]
for episode in range(int(Train_weeks)):
print(episode)
Reward_total_train_eachepisode_target=[]
Reward_energy_train_eachepisode_target=[]
Reward_comfort_train_eachepisode_target=[]
Reward_hygiene_train_eachepisode_target=[]
Energy_train_eachepisode_target=[]
states = environment.reset()
environment.timestep=episode*episode_times
terminal = False
while not terminal:
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
states=tuple(states)
agent.observe(terminal=terminal, reward=reward)
Reward_total_train_eachepisode_target.append(reward)
#Reward_energy_train_eachepisode_target.append(reward[1])
#Reward_comfort_train_eachepisode_target.append(reward[2])
#Reward_hygiene_train_eachepisode_target.append(reward[3])
#Energy_train_eachepisode_target.append(energy)
States_train_target.append(states)
Actions_train_target.append(actions)
Rewards_train_target.append(reward)
#Energy_train_target.append(energy)
Reward_total_train_allepisodes_target.append(Reward_total_train_eachepisode_target)
#Reward_energy_train_allepisodes_target.append(Reward_energy_train_eachepisode_target)
#Reward_comfort_train_allepisodes_target.append(Reward_comfort_train_eachepisode_target)
#Reward_hygiene_train_allepisodes_target.append(Reward_hygiene_train_eachepisode_target)
#Energy_train_allepisodes_target.append(Energy_train_eachepisode_target)
I do not have access to NVIDIA to work with GPU so I guess the only way would be to parallelize the above code.
Thanks