dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
AlexKuhnle on master
fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)
AlexKuhnle on master
Fix environments unittest (compare)
AlexKuhnle on master
Update gym dependency and fix e… (compare)
AlexKuhnle on master
Fix input format bug for agent.… (compare)
AlexKuhnle on ppo-revert
I'm using PPO and I looked at the code, however I need a confirmation about what I understood about it.
The agent has a memory size (at least max_timestep
* batch_size
) which will hold all the episodes trajectory (state, action, reward).
After update_frequency
episodes the agent's weights are updated. By default update_frequency
is the same as batch_size
. Each update will do multi_steps
where each step is clipped by likely_ratio_clipping
and the trajectory is got from the memory based on the subsampling_fraction
. So for a batch_size = 16
, subsampling_fraction = 0.3
and multi_steps = 10
: Each update will be 10 steps with each using 16*0.3 = 4.8 episode from the memory. Is that right?
I'm asking this because I want to improve the train speed. And the only place I can think of is the learning_rate,
batch_size
, multi_steps
.
update_frequency
. While this technically means that data will be slightly off-policy (e.g. update after every episode, but still use a batch_size
of 16), in practice this is often no problem up to a point (also, PPOs objective accounts for importance sampling). Moreover, I think subsampling_fraction=1.0
may do the job in most cases, but it may be interesting to play around with it (also, recently you can also specify fixed-size subsampled batch-sizes by using integers (256
) instead of floats (0.5
), if that's desired). I would recommend starting with update_frequency=1
and subsampling_fraction=1.0
, and then play around with batch_size
(starting with ~10) and multi_steps
(starting with 5-10). And of course learning_rate
is also important.
update_frequency = 1
and I have 2 parallel environments. When one of these envs terminate the episode, for example the pole dropped in cartpole, the other env will have its actions based on the new updated network? Or it will wait to finish both envs and then update?
sync_episodes
option for the Runner
, and if it were desired, this behavior could be changed such that, when using sync_episodes
, episodes are terminated at the same time. But I would be surprised if this is very impactful.
After each update_frequency, is the memory cleared?
For example batch_size=8
, update_frequency=1
, subsampling=1.0
. The first episode will be updated without problems, but will the second update use the trajectory from the first in its update? Or the memory is cleared after each update_frequency
.
If it is not cleared it means that after the seventh episode the first episode will still be used even tho it is from a policy so far away from the current policy. Is this correct?
Take a look in this doc link about the "tensorforce" agente: https://tensorforce.readthedocs.io/en/latest/agents/tensorforce.html
You can add entropy_regularization, l2_regularization, exploration and other hyperparameters to try to improve your agent. However it depends on your action space
evaluation=True
when using Runner
, or act with independent=True
and deterministic=True
(see here).
ValueError: 'MID_1_bid_0/qty_preprocessing' is not a valid module name. Module names must be valid Python identifiers (e.g. a valid class name).
/site-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
name + '_mask'
from the state spec where name
is any name found in the action_spec. It does this recursively using fmap(), calling this function: # Separate auxiliaries
def function(name, spec):
auxiliary = ArrayDict()
if self.config.enable_int_action_masking and spec.type == 'int' and \
spec.num_values is not None:
if name is None:
name = 'action'
# Mask, either part of states or default all true
auxiliary['mask'] = states.pop(name + '_mask', np.ones(
shape=(num_parallel,) + spec.shape + (spec.num_values,), dtype=spec.np_type()
))
return auxiliary
However, when I run it I get a KeyError exception, where the key is a root name from action space. I instrumented the line from nested_dict.pop() that threw the error, like so:
3 elif '/' in key:
2 key, subkey = key.split('/', 1)
1 if not key in self:
371 print(f"pop {key} {subkey}")
1 import pprint
2 pprint.pprint(self)
3 value = super().__getitem__(key)
4 assert isinstance(value, self.__class__)
5 return value.pop(subkey, default)
This is what it printed:pop MID_1_counter_0 promise_date_mask
{'MID_1_bid_0/price': array([1.15]),
'MID_1_bid_0/promise_date': array([4]),
'MID_1_bid_0/qty': array([44135.2]),
'MID_1_bid_0/supplier_tier': array([0]),
...
value = super().__getitem__(key)
Converting sparse IndexedSlices to a dense Tensor of unknown shape.
comes up if you use embeddings (used by "auto" network if state is int
), and maybe in other situations as well. I've read a bit about it a while ago, and it doesn't seem to be critical, if e.g. the number of embeddings (num_values
of int
state) is reasonable. Model initialization may take a while if the network is bigger -- is this the case for you?
self.config.enable_int_action_masking
to False, but I don't see a way to do that... The config object explicitly overrides __set_attr__
and I wasn't able to pass it as a constructor arg to the agent. So, what's the right way to do that?
enable_int_action_masking
can be done via the config
argument of any agent (docs here). That should hopefully work.
NestedDict
-- would you mind posting the shape exception, since I don't know why that would come up?
File "train_rl_agent.py", line 77, in run_agent
runner.run(num_episodes=sim_config["train_episodes"])
File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 545, in run
self.handle_act(parallel=n)
File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 579, in handle_act
actions = self.agent.act(states=self.states[parallel], parallel=parallel)
File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 388, in act
deterministic=deterministic
File "/home/hinrichs/build/tensorforce/tensorforce/agents/recorder.py", line 267, in act
num_parallel=num_parallel
File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 415, in fn_act
states = self.states_spec.to_tensor(value=states, batched=True, name='Agent.act states')
File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
value=value[name], batched=batched, recover_empty=recover_empty
File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
value=value[name], batched=batched, recover_empty=recover_empty
File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensor_spec.py", line 149, in to_tensor
raise TensorforceError.value(name=name, argument='value', value=value, hint='shape')
tensorforce.exception.TensorforceError: Invalid value for TensorSpec.to_tensor argument value: 0 shape.