AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
saved-model
but I faced another problem when loading saved-model
format, I can't load the saved model.Agent.save('model_output', format='saved-model')
Agent.load('model_output', format='saved-model')
636 format = 'checkpoint'
637 else:
--> 638 assert format == 'checkpoint'
639 if filename is None or \
640 not os.path.isfile(os.path.join(directory, filename + '.index')):
AssertionError:
saved-model
part is one of the new features in version 0.6, and unfortunately not quite figured out yet. Right now, there is no support from loading SavedModels from within Tensorforce. However, it is possible to load and use them, as illustrated e.g. here. Generally, I would use the other saving formats while you work in Python/Tensorforce, since there should be little if any benefit of using the SavedModel version, and the SavedModel is for when you want to use the trained model somewhere else for deployment. I haven't considered your point, though, where you're happy to use Python/Tensorforce, but you want to reduce memory requirements. It should be possible to provide an act-only agent again, as there was in previous versions.
Hi, I just upgraded my Tensorforce version to 0.6 and tried to run the cartpole environment in Colab with a GPU but couldn't. Is this the right place to post it? or should I create an issue in the github?
pip install Tensorforce==0.6
from tensorforce.execution import Runner
runner = Runner(
agent=dict(
type="ppo",
batch_size=2
),
environment=dict(environment='gym', level='CartPole'),
max_episode_timesteps=500
)
runner.run(num_episodes=20)
runner.run(num_episodes=10, evaluation=True)
runner.close()
I Got the following messages:
InvalidArgumentError: Cannot assign a device for operation agent/StatefulPartitionedCall/agent/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node agent/StatefulPartitionedCall/agent/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Identity: GPU CPU XLA_CPU XLA_GPU
ResourceScatterAdd: CPU XLA_CPU XLA_GPU
_Arg: GPU CPU XLA_CPU XLA_GPU
ResourceGather: GPU CPU XLA_CPU XLA_GPU
Colocation members, user-requested devices, and framework assigned devices, if any:
agent_929 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
agent/StatefulPartitionedCall/agent/Gather (ResourceGather)
agent/StatefulPartitionedCall/agent/ResourceScatterAdd (ResourceScatterAdd)
Func/agent/StatefulPartitionedCall/input/_90 (Identity) /job:localhost/replica:0/task:0/device:GPU:0
[[{{node agent/Gather}}]] [Op:__inference_act_1063]
Hi all, I got a question. I'm getting this error:
File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/layers/embedding.py", line 94, in initialize
condition='input num_values is None'
tensorforce.exception.TensorforceError: Required Embedding argument num_embeddings given input num_values is None.
when initializing my agent like so
agent = Agent.create(
agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3
)
The environment I'm using is a custom environment, so I'm guessing the cause lies in my implementation? I followed the docs. Is this a common problem? Any help in where I can start looking for where the error comes from?
If it helps here is how I define my environment . Thanks in advance
int
, but missing num_values
here. Why: the default network embeds discrete inputs and hence needs to know how many embeddings are required, and there is currently no default support for arbitrary integers (which is a non-trivial problem). By the looks of it, the state integers encode the type of each grid tile, so I assume there is a fixed set of types, which would be the value to choose for num_values
.
Hi! I'm running into this error
File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/layers/dense.py", line 87, in initialize
is_trainable=self.vars_trainable, is_saved=True
File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/module.py", line 511, in variable
name='variable', argument='spec', value=spec, hint='underspecified'
tensorforce.exception.TensorforceError: Invalid value for variable argument spec: TensorSpec(type=float, shape=(0, 32)) underspecified.
Since I added the network argument and my own custom layers:
agent = Agent.create(
agent='ppo',
environment=environment,
network=[
dict(type='conv2d', window=5, stride=3, size=8, activation='elu'),
dict(type='flatten'),
dict(type='dense', size=32),
dict(type='flatten', name="out"),
], #etc (extra flatten is probably not necessary)
What does underspecified mean in this case, and what can be a cause?
Hi folks. I'm new to Tensorforce and ML in general and am running into an error using TensorForce I'm not sure how to debug:
File "/Users/marco0009/.virtualenvs/puzzle_solver-j2SM-PkM/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = [0, 3000] does not index into shape [1,3000,6]
[[{{node agent/StatefulPartitionedCall/agent/TensorScatterUpdate_1}}]] [Op:__inference_act_1218]
3000 I recognize as my custom Environment's max_step_per_episode
and the 6 I suspect is related to my env's actions:
def actions(self):
return {
"make_move": dict(type="int", num_values=6),
}
but I'm unsure as to what the cause of this exception actually is. Is there anywhere I should be looking in my Environment's configuration for issues?
Hi! Quick question on Loading a model. I saved it as a numpy file. When loading it as written in the docs :agent = Agent.load(directory=checkpointFolder, filename="agent_episode600", format="numpy", environment=environment)
I get an error because it needs additional arguments that I'm not sure about.
`TypeError: __init__() missing 3 required positional arguments: 'update', 'objective', and 'reward_estimation'`
Are there any additional instructions that I need? I tried setting the first two to True, but I'm not sure about the reward_estimation argument
Semi-related, when going through create it doesn't seem that max_episode_timesteps is set for a class where I inherit from Environment
. Is this expected?
For example:
class SomeEnv(Environment):
def __init__(self):
super().__init__()
def states(self):
return dict(
min_value=self.invalid_penalty * self.max_episode_timesteps(),
....
)
....
environment = Environment.create(
environment=SomeEnv,
max_episode_timesteps=500,
)
In the above the call to self. max_episode_timesteps() returns None
this is the configuration
agent = Agent.create(
agent='ppo',
environment=environment,
network=[
dict(type='conv2d', window=5, stride=3, size=8, activation='elu'),
dict(type='flatten'),
dict(type='dense', size=16),
dict(type='flatten', name="out"),
],
batch_size=10,
learning_rate=1e-3,
summarizer=dict(
directory='results/summaries',
labels=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
)
)
and the json
{"agent": "ppo", "states": {"type": "float", "shape": [31, 31, 3]}, "actions": {"move": {"type": "int", "num_values": 4}, "draw": {"type": "int", "num_values": 2}}, "max_episode_timesteps": 500, "batch_size": 10, "network": [{"type": "conv2d", "window": 5, "stride": 3, "size": 8, "activation": "elu"}, {"type": "flatten"}, {"type": "dense", "size": 16}, {"type": "flatten", "name": "out"}], "use_beta_distribution": false, "memory": "minimum", "update_frequency": "batch_size", "learning_rate": 0.001, "subsampling_fraction": 0.33, "optimization_steps": null, "likelihood_ratio_clipping": 0.25, "discount": 0.99, "predict_terminal_values": false, "baseline": null, "baseline_optimizer": null, "state_preprocessing": "linear_normalization", "reward_preprocessing": null, "exploration": 0.0, "variable_noise": 0.0, "l2_regularization": 0.0, "entropy_regularization": 0.0, "parallel_interactions": 1, "config": null, "saver": null, "summarizer": {"directory": "results/summaries", "labels": ["entropy", "kl-divergence", "loss", "reward", "update-norm"]}, "recorder": null, "internals": {}, "initial_internals": {"policy": {}}}
showing this error
ensorforce.exception.TensorforceError: Invalid value for TensorSpec.to_tensor argument value: [[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]] > max_value.
with this shape
'enemy_radar': spaces.Box(low=0, high=100, shape=(100,100)),
I'm using PPO and I looked at the code, however I need a confirmation about what I understood about it.
The agent has a memory size (at least max_timestep
* batch_size
) which will hold all the episodes trajectory (state, action, reward).
After update_frequency
episodes the agent's weights are updated. By default update_frequency
is the same as batch_size
. Each update will do multi_steps
where each step is clipped by likely_ratio_clipping
and the trajectory is got from the memory based on the subsampling_fraction
. So for a batch_size = 16
, subsampling_fraction = 0.3
and multi_steps = 10
: Each update will be 10 steps with each using 16*0.3 = 4.8 episode from the memory. Is that right?
I'm asking this because I want to improve the train speed. And the only place I can think of is the learning_rate,
batch_size
, multi_steps
.
update_frequency
. While this technically means that data will be slightly off-policy (e.g. update after every episode, but still use a batch_size
of 16), in practice this is often no problem up to a point (also, PPOs objective accounts for importance sampling). Moreover, I think subsampling_fraction=1.0
may do the job in most cases, but it may be interesting to play around with it (also, recently you can also specify fixed-size subsampled batch-sizes by using integers (256
) instead of floats (0.5
), if that's desired). I would recommend starting with update_frequency=1
and subsampling_fraction=1.0
, and then play around with batch_size
(starting with ~10) and multi_steps
(starting with 5-10). And of course learning_rate
is also important.
update_frequency = 1
and I have 2 parallel environments. When one of these envs terminate the episode, for example the pole dropped in cartpole, the other env will have its actions based on the new updated network? Or it will wait to finish both envs and then update?
sync_episodes
option for the Runner
, and if it were desired, this behavior could be changed such that, when using sync_episodes
, episodes are terminated at the same time. But I would be surprised if this is very impactful.
After each update_frequency, is the memory cleared?
For example batch_size=8
, update_frequency=1
, subsampling=1.0
. The first episode will be updated without problems, but will the second update use the trajectory from the first in its update? Or the memory is cleared after each update_frequency
.
If it is not cleared it means that after the seventh episode the first episode will still be used even tho it is from a policy so far away from the current policy. Is this correct?
Take a look in this doc link about the "tensorforce" agente: https://tensorforce.readthedocs.io/en/latest/agents/tensorforce.html
You can add entropy_regularization, l2_regularization, exploration and other hyperparameters to try to improve your agent. However it depends on your action space