dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
AlexKuhnle on master
fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)
AlexKuhnle on master
Fix environments unittest (compare)
AlexKuhnle on master
Update gym dependency and fix e… (compare)
choose_action(...)
method look like, and how do these multiple agents exactly work? Also, it sounds like there may be a better way of doing this. Feel free to write me a private message and we can discuss it in more detail.
Runner
, and if so, was it not clear how to use it for parallel execution? Or have you tried to use the slightly more low-level interface via the parallel
argument of agent.act/observe
? I can certainly add more information, but it would also be very welcome if you would consider contributing a short guide... :-) Also, I'm happy to help if there are still questions...
parallel_interactions
argument, but it should be automatically set internally. Note that currently you're running 16 environments, which run locally and hence will be executed iteratively, and the agent call will be batched, i.e. "in parallel". For computationally more expensive environments, it makes sense to use the remote
argument (see here) to execute remotely and hence "fully in parallel".
@AlexKuhnle I was working on updating my code from tensorforce 0.5.0 to 0.5.3, but figured that as parallel environments now have been added I would try to update the code all the way to the latest github version.
I have a custom environment I want to run on multiple CPUs (locally) as my environments include a bunch of fluid mechanical simulations which are very computationally heavy. I borrowed the script Jeff linked above and tried to use Remote="multiprocessing" and changed the environment to use my custom Class, also adding Remote="multiprocessing" to the Env.create() call. This seems to work ok, and the enviroment type is recognized as a MultiprocessingEnvironment.
However, when the code reaches the Runner call, I get an assertion error:
File "/home/fenics/local/tensorforce/tensorforce/execution/runner.py", line 99, in __init__
assert not isinstance(environment, Environment)
AssertionError
Is something going wrong with how I'm creating my environment, or is this something in the Runner class not taking into account that my environment now is of a different type?
Seems like my editing timed out, so I continue here.
I tried editing the script Jeff linked by only adding remote="multiprocessing"
to Env.create()
and Runner()
, which seems to work, except for slowing down over time and when reaching the final episode seems like nothing is happening, and won't finish the run.
I suspect I might have misunderstood "multiprocessing" vs "socket-client", and that what I actually need to use is "socket-client". (I have used the code contributed by Jerab29 with TensorForce 0.5.0, which used the same naming convention with Client, Server, Socket etc. which is causing some suspicions.)
Runner(environment='CartPole-v1')
to Runner(environment=environment)
which causes the same ExceptionError as for a custom env.<class 'tensorforce.environments.multiprocessing_environment.MultiprocessingEnvironment'>
which seems right.
Environment
objects to environments=
and in that case don't need to specify the remote arguments. (Same, if you pass the agent spec dict, you don't need to set parallel_interactions
, as it will be automatically set based on the runner arguments.)
I was able to figure out what I was doing wrong with my custom Environment, and probably a similar reason for the Assertion error above.
I was doing env = Environment.create()
and then passing env
to the Agent and the Runner. However, "multiprocessing" requires the environment we send to Runner NOT be of type "Environment, MultiprocessingEnvironment" or similar.
When I pass the custom Environment Class directly to the runner (and the agent), Agent.create(environment='CustomClass')
and Runner(environment='CustomClass/CartPole-v1')
tensorforce calls Environment.create()
on its own.
I was calling Environment.create()
on an instance which already had gone through Environment.create()
, sort of double stack of Environment.create
.
I'm meeting a few other errors, but they are very much more likely to be issues with how I'm defining my environment and not necessarily caused by tensorforce, so I'll take a closer look myself on those before bothering you again.
1) DQN as every other agent updates automatically, the update(...) function doesn't usually need to be called. You can specify how frequently the update should happen via the update_frequency argument, or implicitly via batch_size (if update_frequency is None, then update_frequency = batch_size). These numbers are timestep-based, so independent of episodes (since DQN is generally largely agnostic to episodes).
update_frequency
always has the same unit
as batch_size
, all specified as part of update
(in TensorforceAgent). So in case of PPO it can't be timestep-based. As you've probably read, update_frequency
specifies how frequently an update is scheduled -- > batch_size
doesn't make sense, otherwise some experience would just be ignored, = batch_size
is the default, but it makes sense to experiment with "increasing" the periodicity / "decreasing" the frequency < batch_size
.
memory = dict(type='recent')
instead of DQN's replay
and custom capacity
.
dict(type=..., shape=...)
, in general you can specify a nested action dict like dict(action1=dict(type=..., shape=...), action2=dict(type=..., shape=...), ...)
. Your environment (if you implement the Environment
class) can just return this for actions()
, and/or your agent can receive this as actions
argument.
Hi @qZhang88, hope the following explanation clarifies your question: PPO, as many other standard policy gradient algorithms, uses complete rollouts (episodes) for reward estimation. In Tensorforce this means that
batch_size
defines the number of episodes (each consisting of many timesteps) per update batch. Moreover, the way the PPO update works according to the paper is that it actually performs multiple updates based on randomly subsampled timestep-minibatches (the entire batch of n episodes is quite big). So thesubsampling_fraction
specifies what fraction of the full batch is subsampled for each minibatch, andoptimization_steps
specifies how often these mini-updates should happen.
still have some questions here, let's say batch size 10, max timestep is 1000, subsampling_fraction is 0.2, so each update batch size is still 10 and timesteps would be less than 200, right? the optimization steps could be increased, to take full advantage of the whole episodes?
I wonder something, while constructing a custom environment can we return different actions values. For example, [1,2,3] in a state and [2,3,4] in another or do I have to handle available actions in execute().
Thanks for the answer in advance.
Hi i have been tinkering with the DQN agent on the BreakoutDeterministic-v4 environment, but i am running into the problem of the agent receiving low rewards and plateaus at around an episode reward of 2-6 after running 10k-40k episodes.
The network config i am currently using is:
keras_net_conf = [
{
"type": "keras",
"layer": "Conv2D",
"filters": 32,
"kernel_size": 8,
"strides": 4,
"activation": "relu",
"padding": "valid",
"kernel_initializer": 'VarianceScaling',
"use_bias": False,
},
{
"type": "keras",
"layer": "Conv2D",
"filters": 64,
"kernel_size": 4,
"strides": 2,
"activation": "relu",
"padding": "valid",
"kernel_initializer": 'VarianceScaling',
"use_bias": False,
},
{
"type": "keras",
"layer": "Conv2D",
"filters": 64,
"kernel_size": 3,
"strides": 1,
"activation": "relu",
"padding": "valid",
"kernel_initializer": 'VarianceScaling',
"use_bias": False,
},
{
"type": "flatten",
},
{
"type": "keras",
"layer": "Dense",
"units": 512,
"activation": "relu",
"use_bias": False,
"kernel_initializer": 'VarianceScaling',
}
]
With preprocessing and exploration set up as:
preproc = [
{
"type": "image",
"width": 50,
"height": 50,
"grayscale": True
},
{
"type": "sequence",
"length": 4,
"concatenate": True
}
]
st_exp = dict(type='decaying', unit='timesteps', decay='polynomial', decay_steps=1000000, initial_value=1.0,
final_value=EXPLORATION, power=1.0)
The actual agent creation is defined as:
agent = Agent.create(agent='dqn',
environment=env,
states=env.states(),
batch_size=32,
preprocessing=dict(
state=preproc,
reward=dict(type="clipping", upper=1.0)
),
learning_rate=LR,
memory=100000,
start_updating=50000,
discount=DISC,
exploration=st_exp,
network=keras_net_conf,
update_frequency=4,
target_sync_frequency=10000,
summarizer=summarizer,
huber_loss=1.0,
name='DQN_agent')
The learning rate is set to 1e-5 and the discount factor to 0.99. The other parameters such a s memory size, max_ep_steps, start_update etc are all set from other implementations that do not use Tensorforce but have managed to achieve comparable scores to the original paper.
So i am wondering whether somebody has come across this issue and if so managed to get it to properly learn and get higher rewards.
Regards.
Hi, I'm trying to understand how to use tensorforce but i think i am missing something. For example why when i try to run
runner = Runner(
agent="ppo",
environment="CartPole-v1",
num_parallel=2
)
runner.run(num_episodes=300)
it works fine, but if try
runner = Runner(
agent="a2c",
environment="CartPole-v1",
num_parallel=2
)
runner.run(num_episodes=300)
it raises
tensorforce.exception.TensorforceError: Invalid value for agent argument update given parallel_interactions > 1: {'unit': 'timesteps', 'batch_size': 10}.
what am i missing here?