Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

  • Oct 02 2021 11:41

    AlexKuhnle on master

    Fix input format bug for agent.… (compare)

  • Oct 02 2021 09:19

    AlexKuhnle on ppo-revert

    (compare)

Matt Pettis
@mpettis
And here is how my RL model learned to control the temperature to be within the band, with a single neuron sigmoid activiation function learned via the REINFORCE algorithm (or, the one laid out in Geron). It shows a bunch of different episodes starting from different initial temperatures: https://ibb.co/LSt3rrQ
Alexander Kuhnle
@AlexKuhnle
Yes, that looks like a nice starter example. I assume the "getting started" guide covers sufficiently how you can either use an existing e.g. Gym environment, or define your own so that other parts of Tensorforce can interface with it. It also covers what the basic training loop looks like and/or how to use the Runner utility. Would you say there are things missing here? Otherwise, I think the main part is that the docs currently offer almost no guidance on how to choose which algorithm and what hyperparameters. Is that where you think more could be added? I can certainly add a bit here, it would probably be based on PPO for now, as I used it most and many users end up using it. I'm also planning for a while already to write more about the full Tensorforce configuration, but shamefully haven't done so yet. >.< Contributions for the examples folder, problem-based tutorial walkthroughs, etc are always very welcome -- it's certainly a part which I'm not covering well enough. :-)
Matt Pettis
@mpettis
I'd say the example defining the CustomEnvironment could use some more explanation, and maybe another example or two. Some more description of what the return of the state() and actions() gives would help. Are they just dicts that describe the type of the return value? What would it look like if the states or actions are not discreet, but continuous? A little more explanation in the comments to the example class would be helpful. The documentation for these methods has information on what I asked above, but again, a little language around the intent of these fields examples exercising these options would be helpful.
And as you mentioned, if you were to do a simple thermostatic controller example, a quick explanation of why you chose the agent you did (among the choices of possible agents) would be good. It doesn't have to be a full-blown RL tutorial, as you can expect users to at least be familiar with the general concept, but maybe not with each of the different choices for agent types and what their practical implications are. They (I) have likely read about them briefly and are coming to this software to try them out for the first time. Or at least, a good subset are...
Alexander Kuhnle
@AlexKuhnle
Makes sense. :+1:
MAYANK GULATI
@MakGulati
Is it possible to write tensorforce wrapper for openai gym type of custom environments?
I have been trying hard to get it work, if you can share example code that would be highly appreciated. Thanks a bunch 😊.
Alexander Kuhnle
@AlexKuhnle
Hi, yes that's possible, in fact it shouldn't be necessary to write any wrapper yourself if you implemented the Gym interface. If you pass your gym.Env class as the level argument to the Tensorforce gym environment, it should work.
If that's not quite what you want to do, can you specify what exactly you tried and didn't work?
Matt Pettis
@mpettis
I put a dead simple example of a room temperature that responds (with thermal resistance) to a heater at the following link. It has the formula for simple exponential resistance, and a plot of what the room temperature looks like when the heater is on vs. off: https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostatic-response-function.ipynb
Matt Pettis
@mpettis
Another place a tutorial example would be good... I'm defining an environment, and I want to, say, initialize my environment with a starting temperature. The sample isn't clear on how I pass initialization parameters to my environment creation. I also want to augment the class to have custom attributes in addition to what the base environment class has. But I can't see how to do that. Evironment.create() seems to return an EnvironmentWrapper class, and it doesn't have the custom attributes I set in the __init__() definition of Environment when I run a dir() on my created object.
MAYANK GULATI
@MakGulati
@AlexKuhnle hi, i believe level is for standard openai gym envs. What i meant was that i am following the same structure as openai gym but made my custom env having same abstract class (which you are check here). So my doubt is how to i make it work with tensorforce
MAYANK GULATI
@MakGulati
i guess you mean i should use this could you please write simply the syntax of the environment assuming the gym based custom env has name mmx
should it be environment =OpenAIGym(level=mmx)
Matt Pettis
@mpettis
OK, I made a stab at creating a bang-bang heater environment by extending the Environment class. I think I did it mostly correctly, and had to add things like incrementing a timestep. I think this (or a cleaned-up version with someone who know more) could be a good example of implementing an Environment from the base class. https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostat-environment.ipynb
Alexander Kuhnle
@AlexKuhnle
@mpettis I'll have a look at the issue and notebook. Would you be happy to add some more information to the docs in the process of sorting out the problem? That would be great. :-)
@MakGulati If you're using the same interface, is there a reason why you don't just use the gym.Env base class? Your environment class could be class MMX(gym.Env): ..., and if you then pass it to the Tensorforce Gym interface, it should be compatible: env = Environment.create(environment='gym', level=MMX, ...). Or have you tried this before? The level argument should certainly accept custom gym.Env subclass objects, and in fact also instances.
Alexander Kuhnle
@AlexKuhnle
@mpettis As mentioned in your issue, the only thing missing -- which is not clear right now from the exception, but will change -- is the environment.reset() before starting to execute. Apart from that you shouldn't need to add attributes or so when using Environment.create(...) (which, I'd say, is the preferred way of initializing an env). I will also add an attribute forwarding for the wrapper, however, it will be readonly which I think should be enough (environment logic should go into the env implementation itself).
It would also be great if you would be willing to contribute your little environment plus agent training script under examples/ :-)
(Plus appropriate acknowledgement, if you're happy to do so, of course)
MAYANK GULATI
@MakGulati
@AlexKuhnle after following this
i m getting

tensorforce.exception.TensorforceError: Unknown Gym space.

Alexander Kuhnle
@AlexKuhnle
Hmm, are you using the Gym objects to specify your state and action space?
Like CartPole, for instance, is specifying it here. If you do the same, can you post your observation/action space?
MAYANK GULATI
@MakGulati
i m doing two levels of inheritance
class Envir(gym.Env):
then class MMX(Envir) :
and for tensorforce env = Environment.create(environment='gym', level=MMX, ...)
MAYANK GULATI
@MakGulati
Thanks a lot for the help. I think there is problem with my MDP formulation because i tried the same stuff as above with cartpole and it works. So i need to fix that MDP stuff. Cheers :)
Matt Pettis
@mpettis
@AlexKuhnle I will work on a custom environment and agent training loop, and you are more than welcome to include the example. Let me know how you want it contributed -- I would guess as a pull request? I'll probably need some guidance on it, let me know if doing so via this channel, or on issues on your git repo, or on a fork of it i would make would be the easiest venue for you.
Alexander Kuhnle
@AlexKuhnle
@MakGulati Okay, let me know if the problem seems to be from Tensorforce side, but what you describe sounds like it should work.
@mpettis Yes, PR is best. You can fork the repo, then commit to your private fork, then from there on Github create the PR to the main repo. Happy to help along the way. Generally, re issue vs Gitter, I would say: if you think it's a problem which others may encounter, too, an issue is preferred, so others can search for it. If it's more discussion, question on PR, suggestions, I would say here is fine (or private message).
Matt Pettis
@mpettis
@AlexKuhnle I'll be doing a pull request, but before I do, here is the proposed full example. It is a jupyter notebook because I wanted to expand on the example and make some inline charts to help with explanation, not sure if you want examples in notebook form. I can change to a python script if that would be preferred. But you can take a prelim look at it here: https://github.com/mpettis/tensorforce/blob/master/examples/temperature-controller.ipynb
Alexander Kuhnle
@AlexKuhnle
Notebook is fine, in fact it's probably the best way of presenting tutorial examples with code and explanation in one place. And the notebook looks great. :-) The framework could really use a few such examples.
Matt Pettis
@mpettis
@AlexKuhnle I made a pull request with the notebook. You are obviously welcome to it, and I'd love to see it in the project. I put my authorship at the top, as I'd like to be able to refer to it as examples of work I've done. Thanks.
Alexander Kuhnle
@AlexKuhnle
Thanks very much, @mpettis, and the acknowledgement is of course perfectly fine. Feel free to also add an acknowledgement to the README here if you end up contributing more (which I would, obvs, encourage :-).
Schade77
@Schade77
Hello, thanks for your great work ! I would like to know if you have more examples about multithreading ? All seems to work well without multithreading, however when I try to set num_parallele I get 'assert not isinstance(environment, Environment)'.
Alexander Kuhnle
@AlexKuhnle
Hi, could you post the relevant code and the last bit of the stacktrace? Otherwise hard to say what exactly the problem is
29 replies
Matt Pettis
@mpettis
For agents, is there documentation (or explanation) of what "memory" does and how it is used? I'm trying to make an example that learns not to turn the heater on too frequently in a thermostat example (basing off my previous example above), like, say, learns to not turn the heater on more than 3 times in 20 consecutive timesteps. It's not learning the way I think it should, when the reward signal is just the distance the temperature is away from the target temperature band, and a large 10x - 100x negative reward if it turns the heater on more than 3 times in 20 timesteps. I would assume that I would not have to expose a state back to the agent that tracks the available heater-on actions available at the current step, but assuming the agent would learn such a policy. I am using a tensorforce agent with a 'policy_gradient' objective, like in the 'getting started' section.
I am working on an example that I can link to that has more details if someone is interested in looking into this. But I think that my question about what's going on with memory can stand alone without the example.
Alexander Kuhnle
@AlexKuhnle
You've probably found the basic documentation but otherwise here. Basically, memory is the mechanism to store experience and how to sample batches of it for an update. recent is a simple buffering mechanism which samples the latest timesteps, replay randomly samples from a usually bigger pool of timesteps, as known from DQN. But this is not what you're looking for.
Matt Pettis
@mpettis
ok, thanks, I'll read that.
Alexander Kuhnle
@AlexKuhnle
It's actually not uncommon that people indeed expose a specially preprocessed state back to the agent -- and if it's possible and helps, why not. However, this is obvs unsatisfying, and the way I would say this should be solved is by using an RNN layer which is unrolled over the sequence of timesteps.
In Tensorforce there is, for instance, internal_lstm, and the internals arguments are related to that. They give the agent an internal state, and consequently the ability to remember what happened earlier in an episode (in theory -- and yes, many DRL models don't have this).
So that's what I would try. I would definitely be interested to hear how it goes. Unfortunately it's a rather involved feature, so it may not be super-easy to get it to train properly.
Alexander Kuhnle
@AlexKuhnle
If you use the auto network, for instance, you can check the argument internal_rnn, otherwise add an internal_lstm layer.
Matt Pettis
@mpettis
That makes a lot of sense, and the way I was thinking about it (but was unsure if some newer model in this catalog could take care of not having to expose external state), and knowing that others have to do the same state exposure is helpful. It's good to know that internal_lstm exists, and I'll keep it in mind, but I think I have to do a lot more practice on just the basics before I get to that. So I'm going to try and do as you suggested, and just explicitly expose that state myself. I think I will be able to contribute this as an example too, but it really won't be much different than my other example, and you probably want some examples that exercise other features of the framework...
... or I may try the auto network now...
last question for now... there's no facility for mixed dtypes for state, right? In my example, I could see that I have two dimensions -- one is the current temperature, a real number, and the second is a count, which is an integer. I'm casting the integer to a real for now, and it should work, but I was curious if I was missing something.
Alexander Kuhnle
@AlexKuhnle
There is, actually. Mixed states can be specified via a "dict of dicts", so e.g. states=dict(state1=dict(type='int', shape=()), state2=dict(type='float', shape=())). However, in that case you can't use a network as a simple stack of layers. Two options: the auto network can take care of it, it will just internally create a reasonable simple network (with some modification options), or you specify a "multi-input" network yourself. You can specify networks as "list of lists", where each of the inner lists is a layer stack, and the special register and retrieve layers are used to combine these stacks to a full network. I realise there is no good example currently. Will need to add one to the docs.
Something like [[dict(type='retrieve', tensors='state1'), ..., dict(type='register', tensor='state1-embedding')], [same for state2], [dict(type='retrieve', tensors=['state1-embedding', 'state2-embedding'], aggregation='concat'), ...]]
Hope that illustrates how the stacks are sticked together via register and retrieve
Alexander Kuhnle
@AlexKuhnle
Note that float is fine for generic numbers, however, if your int really represents a fixed finite set of choices, then turning it into a float is not a good idea, I'd say. A better way is to use an embedding layer to map each of the finite choices to a trainable embedding vector, similar to how words are treated in natural language processing.
(That's how auto treats int inputs, given that num_values specifies the number of embeddings required)
I should spend some time over the weekend to add more info to the docs...