dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
AlexKuhnle on master
fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)
AlexKuhnle on master
Fix environments unittest (compare)
AlexKuhnle on master
Update gym dependency and fix e… (compare)
AlexKuhnle on master
Fix input format bug for agent.… (compare)
AlexKuhnle on ppo-revert
Hi, I'm trying to understand how to use tensorforce but i think i am missing something. For example why when i try to run
runner = Runner(
agent="ppo",
environment="CartPole-v1",
num_parallel=2
)
runner.run(num_episodes=300)
it works fine, but if try
runner = Runner(
agent="a2c",
environment="CartPole-v1",
num_parallel=2
)
runner.run(num_episodes=300)
it raises
tensorforce.exception.TensorforceError: Invalid value for agent argument update given parallel_interactions > 1: {'unit': 'timesteps', 'batch_size': 10}.
what am i missing here?
state()
and actions()
gives would help. Are they just dicts that describe the type of the return value? What would it look like if the states or actions are not discreet, but continuous? A little more explanation in the comments to the example class would be helpful. The documentation for these methods has information on what I asked above, but again, a little language around the intent of these fields examples exercising these options would be helpful.
__init__()
definition of Environment when I run a dir()
on my created object.
level
is for standard openai gym envs. What i meant was that i am following the same structure as openai gym but made my custom env having same abstract class (which you are check here). So my doubt is how to i make it work with tensorforce
mmx
environment =OpenAIGym(level=mmx)
gym.Env
base class? Your environment class could be class MMX(gym.Env): ...
, and if you then pass it to the Tensorforce Gym interface, it should be compatible: env = Environment.create(environment='gym', level=MMX, ...)
. Or have you tried this before? The level
argument should certainly accept custom gym.Env
subclass objects, and in fact also instances.
environment.reset()
before starting to execute. Apart from that you shouldn't need to add attributes or so when using Environment.create(...)
(which, I'd say, is the preferred way of initializing an env). I will also add an attribute forwarding for the wrapper, however, it will be readonly which I think should be enough (environment logic should go into the env implementation itself).
examples/
:-)
recent
is a simple buffering mechanism which samples the latest timesteps, replay
randomly samples from a usually bigger pool of timesteps, as known from DQN. But this is not what you're looking for.
internal_lstm
, and the internals
arguments are related to that. They give the agent an internal state, and consequently the ability to remember what happened earlier in an episode (in theory -- and yes, many DRL models don't have this).
internal_lstm
exists, and I'll keep it in mind, but I think I have to do a lot more practice on just the basics before I get to that. So I'm going to try and do as you suggested, and just explicitly expose that state myself. I think I will be able to contribute this as an example too, but it really won't be much different than my other example, and you probably want some examples that exercise other features of the framework...