AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
size
of np.random.unifom
includes the 1
axismemory
has to be explicitly set now (new change, since the old default was a bit random)'Environment.create
, and max episode length set there (right now necessary, but i'm planning to look into that soon)conv1d_transpose
doesn't work properly in the new TensorFlow, some weird gradient exception, so I removed it'auto'
network, which provides a "default" of combining multiple inputs. The way generally such networks can be specified is via the Register
and Retrieve
layers here, and by specifying the network not as a list of layers, but as a list of lists where each list generally starts with a retrieve
layer and ends with a register
layer, so constitute a sequential "segment" of the overall network.
agent.save(...)
and Agent.load(...)
(see docs). Is that what you're looking for, or what more specifically do you mean by "building the model using tf or tf-lite after training"?
observe
, no matter if I run on GPU or CPU. I am positive GPU is really run on GPU as can be seen nvidia-smi
. GPUs are 2x Tesla K80 on a cloud machine. Any hints what the problem might be?
Hello,
i wonder, is this also the right place to ask questions about tf-agents? If not, please ignore the following:
I am trying to run a simple dqn agent over an environment that has a collection of different specs as observation_spec
. This means that my QNetwork
wants a preprocessing_combiner
, so i followed its documentation and gave it tf.keras.layers.Concatenate()
. Concatenate
does not like to concatenate tuples, so the observation_spec
has to be a list. Next, i want to implement a replay_buffer, specifically a TFUniformReplayBuffer
. Converting it to a dataset (as done in https://github.com/tensorflow/agents/blob/master/docs/tutorials/1_dqn_tutorial.ipynb), i get an error and the explicit instruction to change my specs from lists to tuples. However, then i run into the problem that Concatenate
doesn't like it's input. Am i doing something conceptually wrong here?
remote
argument of Environment.create
(either based on multiprocessing
or socket
) and the extended features of Runner
/ run.py
, which was basically merged with the functionality of ParallelRunner
(which itself was removed).
batch_size
defines the number of episodes (each consisting of many timesteps) per update batch. Moreover, the way the PPO update works according to the paper is that it actually performs multiple updates based on randomly subsampled timestep-minibatches (the entire batch of n episodes is quite big). So the subsampling_fraction
specifies what fraction of the full batch is subsampled for each minibatch, and optimization_steps
specifies how often these mini-updates should happen.