Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

  • Oct 02 2021 11:41

    AlexKuhnle on master

    Fix input format bug for agent.… (compare)

  • Oct 02 2021 09:19

    AlexKuhnle on ppo-revert

    (compare)

emrebulbul23
@emrebulbul23
Hi,

I wonder something, while constructing a custom environment can we return different actions values. For example, [1,2,3] in a state and [2,3,4] in another or do I have to handle available actions in execute().

Thanks for the answer in advance.

Alexander Kuhnle
@AlexKuhnle
Hi, it sounds like you may be interested in action masking, which allows to temporarily mask out some of the generally available categorical actions for some timesteps. If so, your environment execute can return additional values "[int action]_mask" with Boolean masks for the categorical action space. See here for a minimal example. Does that do what you want?
emrebulbul23
@emrebulbul23
Thanks a lot, it looks like what I need.
LinuNils
@LinuNils

Hi i have been tinkering with the DQN agent on the BreakoutDeterministic-v4 environment, but i am running into the problem of the agent receiving low rewards and plateaus at around an episode reward of 2-6 after running 10k-40k episodes.
The network config i am currently using is:

keras_net_conf = [
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 32,
        "kernel_size": 8,
        "strides": 4,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 64,
        "kernel_size": 4,
        "strides": 2,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 64,
        "kernel_size": 3,
        "strides": 1,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "flatten",
    },
    {
        "type": "keras",
        "layer": "Dense",
        "units": 512,
        "activation": "relu",
        "use_bias": False,
        "kernel_initializer": 'VarianceScaling',
    }
]

With preprocessing and exploration set up as:

preproc = [
        {
            "type": "image",
            "width": 50,
            "height": 50,
            "grayscale": True
        },
        {
            "type": "sequence",
            "length": 4,
            "concatenate": True
        }
    ]

st_exp = dict(type='decaying', unit='timesteps', decay='polynomial', decay_steps=1000000, initial_value=1.0,
              final_value=EXPLORATION, power=1.0)

The actual agent creation is defined as:

            agent = Agent.create(agent='dqn',
                                 environment=env,
                                 states=env.states(),
                                 batch_size=32,
                                 preprocessing=dict(
                                                    state=preproc,
                                                    reward=dict(type="clipping", upper=1.0)
                                                    ),
                                 learning_rate=LR,
                                 memory=100000,
                                 start_updating=50000,
                                 discount=DISC,
                                 exploration=st_exp,
                                 network=keras_net_conf,
                                 update_frequency=4,
                                 target_sync_frequency=10000,
                                 summarizer=summarizer,
                                 huber_loss=1.0,
                                 name='DQN_agent')

The learning rate is set to 1e-5 and the discount factor to 0.99. The other parameters such a s memory size, max_ep_steps, start_update etc are all set from other implementations that do not use Tensorforce but have managed to achieve comparable scores to the original paper.

So i am wondering whether somebody has come across this issue and if so managed to get it to properly learn and get higher rewards.

Regards.

Alexander Kuhnle
@AlexKuhnle
I haven't worked with DQN for Breakout before, but the specification looks alright. Could you share the other implementation, just to check their hyperparams? (Sometimes they are implemented somewhat differently)
LinuNils
@LinuNils
Hi Alex.
The ones i have checked out are the following:
https://github.com/dennybritz/reinforcement-learning/blob/master/DQN/dqn.py
https://github.com/fg91/Deep-Q-Learning/blob/master/DQN.ipynb
The second one is gives quite an extensive explanation on how he did it as he had similar issues with the reward stalling, however it happened at around 35 in that case.
Alexander Kuhnle
@AlexKuhnle
Thanks, I can check it over the weekend, whether there may be subtle differences in the meaning of hyperparameters, or any other potential problem.
Federico Galatolo
@galatolofederico

Hi, I'm trying to understand how to use tensorforce but i think i am missing something. For example why when i try to run

runner = Runner(
    agent="ppo",
    environment="CartPole-v1",
    num_parallel=2
)

runner.run(num_episodes=300)

it works fine, but if try

runner = Runner(
    agent="a2c",
    environment="CartPole-v1",
    num_parallel=2
)

runner.run(num_episodes=300)

it raises

tensorforce.exception.TensorforceError: Invalid value for agent argument update given parallel_interactions > 1: {'unit': 'timesteps', 'batch_size': 10}.

what am i missing here?

Shall i open an issue or is it supposed to beheave this way?
Federico Galatolo
@galatolofederico
I have opened the tensorforce/tensorforce#667
Alexander Kuhnle
@AlexKuhnle
Hi, yes, let's discuss in the issue. This is not a bug, but a (minor) problem with the parallelization feature.
Matt Pettis
@mpettis
Just starting on tensorforce -- does the community have any guidance as to where I can find a bunch of examples to look through for tensorforce? I saw the one in "Getting Started", but could do with some more examples. Thanks!
Alexander Kuhnle
@AlexKuhnle
Hi, there are not many, unfortunately. However, the code would look similar in most cases, and instead the variation will mainly be in the agent arguments, and for this it's hard to provide a lot of proven guidelines (although there could be more, admittedly). For which aspect would you like to see more examples? Maybe I can add some or at least point you somewhere (sometimes the unittests can be useful, for instance).
Matt Pettis
@mpettis
Thanks Alexander! I'm trying to start small, to make an environment analagous to a room, heater and thermostat. In keras, I took the REINFORCE model from Geron's Hands On book and made an environment that has a room that decays over time to 0 degF (with a time constant) when the heater is off, rises over time to 10 degF (with the same time constant) when the heater is on, and a reward function that is 0 between 4 and 6 degF, and a linear decreasing reward as the temperature is outside of that band. The heater can either be on or off, so the RL agent will be turning the heater on and off as the temperature gets too high or too low. I made a simple NN to learn that, and I'd like to see an example of how I would implement the same in tensorforce. I see the general layout in the "Getting started" part, and think I can start down the path, but it would be nice to have an example that has some more specifics filled in to follow. I think this would be a decent prototype of a non-videogame sort, and a reasonable model in the industrial setting to start to wrap our heads around. Thank you for your offer to help, and I am willing to do what I can -- if you have a skeleton you think you could put together, I'd be happy to make more embellishments and contribute back as an example if it could be put into an example library or something.
Matt Pettis
@mpettis
Here's an image of what it looks like when the heater is on, then off. It shows characteristic decay and lag in approaching max temp. The scale is 0-50 here, but I can scale down to 0-10: https://ibb.co/k8P6GCz
And here is how my RL model learned to control the temperature to be within the band, with a single neuron sigmoid activiation function learned via the REINFORCE algorithm (or, the one laid out in Geron). It shows a bunch of different episodes starting from different initial temperatures: https://ibb.co/LSt3rrQ
Alexander Kuhnle
@AlexKuhnle
Yes, that looks like a nice starter example. I assume the "getting started" guide covers sufficiently how you can either use an existing e.g. Gym environment, or define your own so that other parts of Tensorforce can interface with it. It also covers what the basic training loop looks like and/or how to use the Runner utility. Would you say there are things missing here? Otherwise, I think the main part is that the docs currently offer almost no guidance on how to choose which algorithm and what hyperparameters. Is that where you think more could be added? I can certainly add a bit here, it would probably be based on PPO for now, as I used it most and many users end up using it. I'm also planning for a while already to write more about the full Tensorforce configuration, but shamefully haven't done so yet. >.< Contributions for the examples folder, problem-based tutorial walkthroughs, etc are always very welcome -- it's certainly a part which I'm not covering well enough. :-)
Matt Pettis
@mpettis
I'd say the example defining the CustomEnvironment could use some more explanation, and maybe another example or two. Some more description of what the return of the state() and actions() gives would help. Are they just dicts that describe the type of the return value? What would it look like if the states or actions are not discreet, but continuous? A little more explanation in the comments to the example class would be helpful. The documentation for these methods has information on what I asked above, but again, a little language around the intent of these fields examples exercising these options would be helpful.
And as you mentioned, if you were to do a simple thermostatic controller example, a quick explanation of why you chose the agent you did (among the choices of possible agents) would be good. It doesn't have to be a full-blown RL tutorial, as you can expect users to at least be familiar with the general concept, but maybe not with each of the different choices for agent types and what their practical implications are. They (I) have likely read about them briefly and are coming to this software to try them out for the first time. Or at least, a good subset are...
Alexander Kuhnle
@AlexKuhnle
Makes sense. :+1:
MAYANK GULATI
@MakGulati
Is it possible to write tensorforce wrapper for openai gym type of custom environments?
I have been trying hard to get it work, if you can share example code that would be highly appreciated. Thanks a bunch 😊.
Alexander Kuhnle
@AlexKuhnle
Hi, yes that's possible, in fact it shouldn't be necessary to write any wrapper yourself if you implemented the Gym interface. If you pass your gym.Env class as the level argument to the Tensorforce gym environment, it should work.
If that's not quite what you want to do, can you specify what exactly you tried and didn't work?
Matt Pettis
@mpettis
I put a dead simple example of a room temperature that responds (with thermal resistance) to a heater at the following link. It has the formula for simple exponential resistance, and a plot of what the room temperature looks like when the heater is on vs. off: https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostatic-response-function.ipynb
Matt Pettis
@mpettis
Another place a tutorial example would be good... I'm defining an environment, and I want to, say, initialize my environment with a starting temperature. The sample isn't clear on how I pass initialization parameters to my environment creation. I also want to augment the class to have custom attributes in addition to what the base environment class has. But I can't see how to do that. Evironment.create() seems to return an EnvironmentWrapper class, and it doesn't have the custom attributes I set in the __init__() definition of Environment when I run a dir() on my created object.
MAYANK GULATI
@MakGulati
@AlexKuhnle hi, i believe level is for standard openai gym envs. What i meant was that i am following the same structure as openai gym but made my custom env having same abstract class (which you are check here). So my doubt is how to i make it work with tensorforce
MAYANK GULATI
@MakGulati
i guess you mean i should use this could you please write simply the syntax of the environment assuming the gym based custom env has name mmx
should it be environment =OpenAIGym(level=mmx)
Matt Pettis
@mpettis
OK, I made a stab at creating a bang-bang heater environment by extending the Environment class. I think I did it mostly correctly, and had to add things like incrementing a timestep. I think this (or a cleaned-up version with someone who know more) could be a good example of implementing an Environment from the base class. https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostat-environment.ipynb
Alexander Kuhnle
@AlexKuhnle
@mpettis I'll have a look at the issue and notebook. Would you be happy to add some more information to the docs in the process of sorting out the problem? That would be great. :-)
@MakGulati If you're using the same interface, is there a reason why you don't just use the gym.Env base class? Your environment class could be class MMX(gym.Env): ..., and if you then pass it to the Tensorforce Gym interface, it should be compatible: env = Environment.create(environment='gym', level=MMX, ...). Or have you tried this before? The level argument should certainly accept custom gym.Env subclass objects, and in fact also instances.
Alexander Kuhnle
@AlexKuhnle
@mpettis As mentioned in your issue, the only thing missing -- which is not clear right now from the exception, but will change -- is the environment.reset() before starting to execute. Apart from that you shouldn't need to add attributes or so when using Environment.create(...) (which, I'd say, is the preferred way of initializing an env). I will also add an attribute forwarding for the wrapper, however, it will be readonly which I think should be enough (environment logic should go into the env implementation itself).
It would also be great if you would be willing to contribute your little environment plus agent training script under examples/ :-)
(Plus appropriate acknowledgement, if you're happy to do so, of course)
MAYANK GULATI
@MakGulati
@AlexKuhnle after following this
i m getting

tensorforce.exception.TensorforceError: Unknown Gym space.

Alexander Kuhnle
@AlexKuhnle
Hmm, are you using the Gym objects to specify your state and action space?
Like CartPole, for instance, is specifying it here. If you do the same, can you post your observation/action space?
MAYANK GULATI
@MakGulati
i m doing two levels of inheritance
class Envir(gym.Env):
then class MMX(Envir) :
and for tensorforce env = Environment.create(environment='gym', level=MMX, ...)
MAYANK GULATI
@MakGulati
Thanks a lot for the help. I think there is problem with my MDP formulation because i tried the same stuff as above with cartpole and it works. So i need to fix that MDP stuff. Cheers :)
Matt Pettis
@mpettis
@AlexKuhnle I will work on a custom environment and agent training loop, and you are more than welcome to include the example. Let me know how you want it contributed -- I would guess as a pull request? I'll probably need some guidance on it, let me know if doing so via this channel, or on issues on your git repo, or on a fork of it i would make would be the easiest venue for you.
Alexander Kuhnle
@AlexKuhnle
@MakGulati Okay, let me know if the problem seems to be from Tensorforce side, but what you describe sounds like it should work.
@mpettis Yes, PR is best. You can fork the repo, then commit to your private fork, then from there on Github create the PR to the main repo. Happy to help along the way. Generally, re issue vs Gitter, I would say: if you think it's a problem which others may encounter, too, an issue is preferred, so others can search for it. If it's more discussion, question on PR, suggestions, I would say here is fine (or private message).
Matt Pettis
@mpettis
@AlexKuhnle I'll be doing a pull request, but before I do, here is the proposed full example. It is a jupyter notebook because I wanted to expand on the example and make some inline charts to help with explanation, not sure if you want examples in notebook form. I can change to a python script if that would be preferred. But you can take a prelim look at it here: https://github.com/mpettis/tensorforce/blob/master/examples/temperature-controller.ipynb
Alexander Kuhnle
@AlexKuhnle
Notebook is fine, in fact it's probably the best way of presenting tutorial examples with code and explanation in one place. And the notebook looks great. :-) The framework could really use a few such examples.
Matt Pettis
@mpettis
@AlexKuhnle I made a pull request with the notebook. You are obviously welcome to it, and I'd love to see it in the project. I put my authorship at the top, as I'd like to be able to refer to it as examples of work I've done. Thanks.
Alexander Kuhnle
@AlexKuhnle
Thanks very much, @mpettis, and the acknowledgement is of course perfectly fine. Feel free to also add an acknowledgement to the README here if you end up contributing more (which I would, obvs, encourage :-).
Schade77
@Schade77
Hello, thanks for your great work ! I would like to know if you have more examples about multithreading ? All seems to work well without multithreading, however when I try to set num_parallele I get 'assert not isinstance(environment, Environment)'.
Alexander Kuhnle
@AlexKuhnle
Hi, could you post the relevant code and the last bit of the stacktrace? Otherwise hard to say what exactly the problem is
29 replies
Matt Pettis
@mpettis
For agents, is there documentation (or explanation) of what "memory" does and how it is used? I'm trying to make an example that learns not to turn the heater on too frequently in a thermostat example (basing off my previous example above), like, say, learns to not turn the heater on more than 3 times in 20 consecutive timesteps. It's not learning the way I think it should, when the reward signal is just the distance the temperature is away from the target temperature band, and a large 10x - 100x negative reward if it turns the heater on more than 3 times in 20 timesteps. I would assume that I would not have to expose a state back to the agent that tracks the available heater-on actions available at the current step, but assuming the agent would learn such a policy. I am using a tensorforce agent with a 'policy_gradient' objective, like in the 'getting started' section.