AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
Hi folks. I'm new to Tensorforce and ML in general and am running into an error using TensorForce I'm not sure how to debug:
File "/Users/marco0009/.virtualenvs/puzzle_solver-j2SM-PkM/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = [0, 3000] does not index into shape [1,3000,6]
[[{{node agent/StatefulPartitionedCall/agent/TensorScatterUpdate_1}}]] [Op:__inference_act_1218]
3000 I recognize as my custom Environment's max_step_per_episode
and the 6 I suspect is related to my env's actions:
def actions(self):
return {
"make_move": dict(type="int", num_values=6),
}
but I'm unsure as to what the cause of this exception actually is. Is there anywhere I should be looking in my Environment's configuration for issues?
Hi! Quick question on Loading a model. I saved it as a numpy file. When loading it as written in the docs :agent = Agent.load(directory=checkpointFolder, filename="agent_episode600", format="numpy", environment=environment)
I get an error because it needs additional arguments that I'm not sure about.
`TypeError: __init__() missing 3 required positional arguments: 'update', 'objective', and 'reward_estimation'`
Are there any additional instructions that I need? I tried setting the first two to True, but I'm not sure about the reward_estimation argument
Semi-related, when going through create it doesn't seem that max_episode_timesteps is set for a class where I inherit from Environment
. Is this expected?
For example:
class SomeEnv(Environment):
def __init__(self):
super().__init__()
def states(self):
return dict(
min_value=self.invalid_penalty * self.max_episode_timesteps(),
....
)
....
environment = Environment.create(
environment=SomeEnv,
max_episode_timesteps=500,
)
In the above the call to self. max_episode_timesteps() returns None
this is the configuration
agent = Agent.create(
agent='ppo',
environment=environment,
network=[
dict(type='conv2d', window=5, stride=3, size=8, activation='elu'),
dict(type='flatten'),
dict(type='dense', size=16),
dict(type='flatten', name="out"),
],
batch_size=10,
learning_rate=1e-3,
summarizer=dict(
directory='results/summaries',
labels=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
)
)
and the json
{"agent": "ppo", "states": {"type": "float", "shape": [31, 31, 3]}, "actions": {"move": {"type": "int", "num_values": 4}, "draw": {"type": "int", "num_values": 2}}, "max_episode_timesteps": 500, "batch_size": 10, "network": [{"type": "conv2d", "window": 5, "stride": 3, "size": 8, "activation": "elu"}, {"type": "flatten"}, {"type": "dense", "size": 16}, {"type": "flatten", "name": "out"}], "use_beta_distribution": false, "memory": "minimum", "update_frequency": "batch_size", "learning_rate": 0.001, "subsampling_fraction": 0.33, "optimization_steps": null, "likelihood_ratio_clipping": 0.25, "discount": 0.99, "predict_terminal_values": false, "baseline": null, "baseline_optimizer": null, "state_preprocessing": "linear_normalization", "reward_preprocessing": null, "exploration": 0.0, "variable_noise": 0.0, "l2_regularization": 0.0, "entropy_regularization": 0.0, "parallel_interactions": 1, "config": null, "saver": null, "summarizer": {"directory": "results/summaries", "labels": ["entropy", "kl-divergence", "loss", "reward", "update-norm"]}, "recorder": null, "internals": {}, "initial_internals": {"policy": {}}}
showing this error
ensorforce.exception.TensorforceError: Invalid value for TensorSpec.to_tensor argument value: [[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]] > max_value.
with this shape
'enemy_radar': spaces.Box(low=0, high=100, shape=(100,100)),
I'm using PPO and I looked at the code, however I need a confirmation about what I understood about it.
The agent has a memory size (at least max_timestep
* batch_size
) which will hold all the episodes trajectory (state, action, reward).
After update_frequency
episodes the agent's weights are updated. By default update_frequency
is the same as batch_size
. Each update will do multi_steps
where each step is clipped by likely_ratio_clipping
and the trajectory is got from the memory based on the subsampling_fraction
. So for a batch_size = 16
, subsampling_fraction = 0.3
and multi_steps = 10
: Each update will be 10 steps with each using 16*0.3 = 4.8 episode from the memory. Is that right?
I'm asking this because I want to improve the train speed. And the only place I can think of is the learning_rate,
batch_size
, multi_steps
.
update_frequency
. While this technically means that data will be slightly off-policy (e.g. update after every episode, but still use a batch_size
of 16), in practice this is often no problem up to a point (also, PPOs objective accounts for importance sampling). Moreover, I think subsampling_fraction=1.0
may do the job in most cases, but it may be interesting to play around with it (also, recently you can also specify fixed-size subsampled batch-sizes by using integers (256
) instead of floats (0.5
), if that's desired). I would recommend starting with update_frequency=1
and subsampling_fraction=1.0
, and then play around with batch_size
(starting with ~10) and multi_steps
(starting with 5-10). And of course learning_rate
is also important.
update_frequency = 1
and I have 2 parallel environments. When one of these envs terminate the episode, for example the pole dropped in cartpole, the other env will have its actions based on the new updated network? Or it will wait to finish both envs and then update?
sync_episodes
option for the Runner
, and if it were desired, this behavior could be changed such that, when using sync_episodes
, episodes are terminated at the same time. But I would be surprised if this is very impactful.
After each update_frequency, is the memory cleared?
For example batch_size=8
, update_frequency=1
, subsampling=1.0
. The first episode will be updated without problems, but will the second update use the trajectory from the first in its update? Or the memory is cleared after each update_frequency
.
If it is not cleared it means that after the seventh episode the first episode will still be used even tho it is from a policy so far away from the current policy. Is this correct?
Take a look in this doc link about the "tensorforce" agente: https://tensorforce.readthedocs.io/en/latest/agents/tensorforce.html
You can add entropy_regularization, l2_regularization, exploration and other hyperparameters to try to improve your agent. However it depends on your action space
evaluation=True
when using Runner
, or act with independent=True
and deterministic=True
(see here).
ValueError: 'MID_1_bid_0/qty_preprocessing' is not a valid module name. Module names must be valid Python identifiers (e.g. a valid class name).