dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
AlexKuhnle on master
fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)
AlexKuhnle on master
Fix environments unittest (compare)
AlexKuhnle on master
Update gym dependency and fix e… (compare)
AlexKuhnle on master
Fix input format bug for agent.… (compare)
AlexKuhnle on ppo-revert
state()
and actions()
gives would help. Are they just dicts that describe the type of the return value? What would it look like if the states or actions are not discreet, but continuous? A little more explanation in the comments to the example class would be helpful. The documentation for these methods has information on what I asked above, but again, a little language around the intent of these fields examples exercising these options would be helpful.
__init__()
definition of Environment when I run a dir()
on my created object.
level
is for standard openai gym envs. What i meant was that i am following the same structure as openai gym but made my custom env having same abstract class (which you are check here). So my doubt is how to i make it work with tensorforce
mmx
environment =OpenAIGym(level=mmx)
gym.Env
base class? Your environment class could be class MMX(gym.Env): ...
, and if you then pass it to the Tensorforce Gym interface, it should be compatible: env = Environment.create(environment='gym', level=MMX, ...)
. Or have you tried this before? The level
argument should certainly accept custom gym.Env
subclass objects, and in fact also instances.
environment.reset()
before starting to execute. Apart from that you shouldn't need to add attributes or so when using Environment.create(...)
(which, I'd say, is the preferred way of initializing an env). I will also add an attribute forwarding for the wrapper, however, it will be readonly which I think should be enough (environment logic should go into the env implementation itself).
examples/
:-)
recent
is a simple buffering mechanism which samples the latest timesteps, replay
randomly samples from a usually bigger pool of timesteps, as known from DQN. But this is not what you're looking for.
internal_lstm
, and the internals
arguments are related to that. They give the agent an internal state, and consequently the ability to remember what happened earlier in an episode (in theory -- and yes, many DRL models don't have this).
internal_lstm
exists, and I'll keep it in mind, but I think I have to do a lot more practice on just the basics before I get to that. So I'm going to try and do as you suggested, and just explicitly expose that state myself. I think I will be able to contribute this as an example too, but it really won't be much different than my other example, and you probably want some examples that exercise other features of the framework...
auto
network now...
states=dict(state1=dict(type='int', shape=()), state2=dict(type='float', shape=()))
. However, in that case you can't use a network as a simple stack of layers. Two options: the auto
network can take care of it, it will just internally create a reasonable simple network (with some modification options), or you specify a "multi-input" network yourself. You can specify networks as "list of lists", where each of the inner lists is a layer stack, and the special register
and retrieve
layers are used to combine these stacks to a full network. I realise there is no good example currently. Will need to add one to the docs.
[[dict(type='retrieve', tensors='state1'), ..., dict(type='register', tensor='state1-embedding')], [same for state2], [dict(type='retrieve', tensors=['state1-embedding', 'state2-embedding'], aggregation='concat'), ...]]
float
is fine for generic numbers, however, if your int
really represents a fixed finite set of choices, then turning it into a float
is not a good idea, I'd say. A better way is to use an embedding
layer to map each of the finite choices to a trainable embedding vector, similar to how words are treated in natural language processing.
auto
treats int
inputs, given that num_values
specifies the number of embeddings required)