Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 24 17:30

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

  • Oct 02 2021 11:41

    AlexKuhnle on master

    Fix input format bug for agent.… (compare)

Le o
@ChipChap1_gitlab
@AlexKuhnle : Thank you very much for your response. Yes, the goal would be to have an online update. How can this be realised in tensorforce?
Qiao.Zhang
@qZhang88
@AlexKuhnle Hi, found this parameter while creating most kind of agents, what is update_frequency is used for? have checked frequency in code, but didn't find it be used. Thanks for answering my question?
update_frequency ("never" | parameter, long > 0) – Frequency of updates (default: batch_size).
Qiao.Zhang
@qZhang88
@AlexKuhnle saw your answer to another question above, I am curious about this, could update_frequency not equal to batch_size, I think when data collect enough, update is called? I am using PPO in tensorforce, update_frequency should not be timestep_based right?
1) DQN as every other agent updates automatically, the update(...) function doesn't usually need to be called. You can specify how frequently the update should happen via the update_frequency argument, or implicitly via batch_size (if update_frequency is None, then update_frequency = batch_size). These numbers are timestep-based, so independent of episodes (since DQN is generally largely agnostic to episodes).
Alexander Kuhnle
@AlexKuhnle
Hi @qZhang88 , update_frequency always has the same unit as batch_size, all specified as part of update (in TensorforceAgent). So in case of PPO it can't be timestep-based. As you've probably read, update_frequency specifies how frequently an update is scheduled -- > batch_size doesn't make sense, otherwise some experience would just be ignored, = batch_size is the default, but it makes sense to experiment with "increasing" the periodicity / "decreasing" the frequency < batch_size.
@ChipChap1_gitlab , a simple way is just to choose a very small memory size, but better is to use the more general TensorforceAgent, replicate DQN's internal configuration and change the bits that you'd like to adapt. In this case I think the only change necessary is memory = dict(type='recent') instead of DQN's replay and custom capacity.
Qiao.Zhang
@qZhang88
@AlexKuhnle Thanks, one more question, we are doing a plane flight game, and the Env could take multiple actins at same time, like change of pitch, yaw, roll and boost. pitch, yaw and roll each have 3 choice, 1, -1 and 0, but boost has only 1 or 0. So does TensorForce support mix actions? how should i create a PPO agent for that? should action param be dict(type=int, shape=[3,3,3,2], num_values=??)
Alexander Kuhnle
@AlexKuhnle
Instead of using a single-action dict a la dict(type=..., shape=...), in general you can specify a nested action dict like dict(action1=dict(type=..., shape=...), action2=dict(type=..., shape=...), ...). Your environment (if you implement the Environment class) can just return this for actions(), and/or your agent can receive this as actions argument.
Qiao.Zhang
@qZhang88

Hi @qZhang88, hope the following explanation clarifies your question: PPO, as many other standard policy gradient algorithms, uses complete rollouts (episodes) for reward estimation. In Tensorforce this means that batch_size defines the number of episodes (each consisting of many timesteps) per update batch. Moreover, the way the PPO update works according to the paper is that it actually performs multiple updates based on randomly subsampled timestep-minibatches (the entire batch of n episodes is quite big). So the subsampling_fraction specifies what fraction of the full batch is subsampled for each minibatch, and optimization_steps specifies how often these mini-updates should happen.

still have some questions here, let's say batch size 10, max timestep is 1000, subsampling_fraction is 0.2, so each update batch size is still 10 and timesteps would be less than 200, right? the optimization steps could be increased, to take full advantage of the whole episodes?

would the update be parallel or the sampling process will wait till update finished?
Alexander Kuhnle
@AlexKuhnle
Batch-size 10 and max-timesteps 1k means overall up to 10k timesteps per update. PPO subsampling-fraction 0.2 means it subsamples 2k from this batch per "mini-update", and repeats this optimization-steps times. In particular, it doesn't make sense to choose subsampling-fraction and optimization-steps, such that their product is < 1, as then you wouldn't make use of the full batch, so you may as well decrease batch-size and increase the other parameters accordingly (batch-size 10, fraction 0.2, steps 3 is equivalent to batch-size 6, fraction 0.33, steps 3, but less memory-consuming).
The update is not in parallel and generally not optimized in a special way. It would be possible to do a few things there, and I think this was part of the motivation for this design in PPO (not sure), but Tensorforce doesn't do that currently (and it's not really primary focus in general, although I would be curious to investigate what could be done, if I had time :-).
emrebulbul23
@emrebulbul23
Hi,

I wonder something, while constructing a custom environment can we return different actions values. For example, [1,2,3] in a state and [2,3,4] in another or do I have to handle available actions in execute().

Thanks for the answer in advance.

Alexander Kuhnle
@AlexKuhnle
Hi, it sounds like you may be interested in action masking, which allows to temporarily mask out some of the generally available categorical actions for some timesteps. If so, your environment execute can return additional values "[int action]_mask" with Boolean masks for the categorical action space. See here for a minimal example. Does that do what you want?
emrebulbul23
@emrebulbul23
Thanks a lot, it looks like what I need.
LinuNils
@LinuNils

Hi i have been tinkering with the DQN agent on the BreakoutDeterministic-v4 environment, but i am running into the problem of the agent receiving low rewards and plateaus at around an episode reward of 2-6 after running 10k-40k episodes.
The network config i am currently using is:

keras_net_conf = [
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 32,
        "kernel_size": 8,
        "strides": 4,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 64,
        "kernel_size": 4,
        "strides": 2,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "keras",
        "layer": "Conv2D",
        "filters": 64,
        "kernel_size": 3,
        "strides": 1,
        "activation": "relu",
        "padding": "valid",
        "kernel_initializer": 'VarianceScaling',
        "use_bias": False,
    },
    {
        "type": "flatten",
    },
    {
        "type": "keras",
        "layer": "Dense",
        "units": 512,
        "activation": "relu",
        "use_bias": False,
        "kernel_initializer": 'VarianceScaling',
    }
]

With preprocessing and exploration set up as:

preproc = [
        {
            "type": "image",
            "width": 50,
            "height": 50,
            "grayscale": True
        },
        {
            "type": "sequence",
            "length": 4,
            "concatenate": True
        }
    ]

st_exp = dict(type='decaying', unit='timesteps', decay='polynomial', decay_steps=1000000, initial_value=1.0,
              final_value=EXPLORATION, power=1.0)

The actual agent creation is defined as:

            agent = Agent.create(agent='dqn',
                                 environment=env,
                                 states=env.states(),
                                 batch_size=32,
                                 preprocessing=dict(
                                                    state=preproc,
                                                    reward=dict(type="clipping", upper=1.0)
                                                    ),
                                 learning_rate=LR,
                                 memory=100000,
                                 start_updating=50000,
                                 discount=DISC,
                                 exploration=st_exp,
                                 network=keras_net_conf,
                                 update_frequency=4,
                                 target_sync_frequency=10000,
                                 summarizer=summarizer,
                                 huber_loss=1.0,
                                 name='DQN_agent')

The learning rate is set to 1e-5 and the discount factor to 0.99. The other parameters such a s memory size, max_ep_steps, start_update etc are all set from other implementations that do not use Tensorforce but have managed to achieve comparable scores to the original paper.

So i am wondering whether somebody has come across this issue and if so managed to get it to properly learn and get higher rewards.

Regards.

Alexander Kuhnle
@AlexKuhnle
I haven't worked with DQN for Breakout before, but the specification looks alright. Could you share the other implementation, just to check their hyperparams? (Sometimes they are implemented somewhat differently)
LinuNils
@LinuNils
Hi Alex.
The ones i have checked out are the following:
https://github.com/dennybritz/reinforcement-learning/blob/master/DQN/dqn.py
https://github.com/fg91/Deep-Q-Learning/blob/master/DQN.ipynb
The second one is gives quite an extensive explanation on how he did it as he had similar issues with the reward stalling, however it happened at around 35 in that case.
Alexander Kuhnle
@AlexKuhnle
Thanks, I can check it over the weekend, whether there may be subtle differences in the meaning of hyperparameters, or any other potential problem.
Federico Galatolo
@galatolofederico

Hi, I'm trying to understand how to use tensorforce but i think i am missing something. For example why when i try to run

runner = Runner(
    agent="ppo",
    environment="CartPole-v1",
    num_parallel=2
)

runner.run(num_episodes=300)

it works fine, but if try

runner = Runner(
    agent="a2c",
    environment="CartPole-v1",
    num_parallel=2
)

runner.run(num_episodes=300)

it raises

tensorforce.exception.TensorforceError: Invalid value for agent argument update given parallel_interactions > 1: {'unit': 'timesteps', 'batch_size': 10}.

what am i missing here?

Shall i open an issue or is it supposed to beheave this way?
Federico Galatolo
@galatolofederico
I have opened the tensorforce/tensorforce#667
Alexander Kuhnle
@AlexKuhnle
Hi, yes, let's discuss in the issue. This is not a bug, but a (minor) problem with the parallelization feature.
Matt Pettis
@mpettis
Just starting on tensorforce -- does the community have any guidance as to where I can find a bunch of examples to look through for tensorforce? I saw the one in "Getting Started", but could do with some more examples. Thanks!
Alexander Kuhnle
@AlexKuhnle
Hi, there are not many, unfortunately. However, the code would look similar in most cases, and instead the variation will mainly be in the agent arguments, and for this it's hard to provide a lot of proven guidelines (although there could be more, admittedly). For which aspect would you like to see more examples? Maybe I can add some or at least point you somewhere (sometimes the unittests can be useful, for instance).
Matt Pettis
@mpettis
Thanks Alexander! I'm trying to start small, to make an environment analagous to a room, heater and thermostat. In keras, I took the REINFORCE model from Geron's Hands On book and made an environment that has a room that decays over time to 0 degF (with a time constant) when the heater is off, rises over time to 10 degF (with the same time constant) when the heater is on, and a reward function that is 0 between 4 and 6 degF, and a linear decreasing reward as the temperature is outside of that band. The heater can either be on or off, so the RL agent will be turning the heater on and off as the temperature gets too high or too low. I made a simple NN to learn that, and I'd like to see an example of how I would implement the same in tensorforce. I see the general layout in the "Getting started" part, and think I can start down the path, but it would be nice to have an example that has some more specifics filled in to follow. I think this would be a decent prototype of a non-videogame sort, and a reasonable model in the industrial setting to start to wrap our heads around. Thank you for your offer to help, and I am willing to do what I can -- if you have a skeleton you think you could put together, I'd be happy to make more embellishments and contribute back as an example if it could be put into an example library or something.
Matt Pettis
@mpettis
Here's an image of what it looks like when the heater is on, then off. It shows characteristic decay and lag in approaching max temp. The scale is 0-50 here, but I can scale down to 0-10: https://ibb.co/k8P6GCz
And here is how my RL model learned to control the temperature to be within the band, with a single neuron sigmoid activiation function learned via the REINFORCE algorithm (or, the one laid out in Geron). It shows a bunch of different episodes starting from different initial temperatures: https://ibb.co/LSt3rrQ
Alexander Kuhnle
@AlexKuhnle
Yes, that looks like a nice starter example. I assume the "getting started" guide covers sufficiently how you can either use an existing e.g. Gym environment, or define your own so that other parts of Tensorforce can interface with it. It also covers what the basic training loop looks like and/or how to use the Runner utility. Would you say there are things missing here? Otherwise, I think the main part is that the docs currently offer almost no guidance on how to choose which algorithm and what hyperparameters. Is that where you think more could be added? I can certainly add a bit here, it would probably be based on PPO for now, as I used it most and many users end up using it. I'm also planning for a while already to write more about the full Tensorforce configuration, but shamefully haven't done so yet. >.< Contributions for the examples folder, problem-based tutorial walkthroughs, etc are always very welcome -- it's certainly a part which I'm not covering well enough. :-)
Matt Pettis
@mpettis
I'd say the example defining the CustomEnvironment could use some more explanation, and maybe another example or two. Some more description of what the return of the state() and actions() gives would help. Are they just dicts that describe the type of the return value? What would it look like if the states or actions are not discreet, but continuous? A little more explanation in the comments to the example class would be helpful. The documentation for these methods has information on what I asked above, but again, a little language around the intent of these fields examples exercising these options would be helpful.
And as you mentioned, if you were to do a simple thermostatic controller example, a quick explanation of why you chose the agent you did (among the choices of possible agents) would be good. It doesn't have to be a full-blown RL tutorial, as you can expect users to at least be familiar with the general concept, but maybe not with each of the different choices for agent types and what their practical implications are. They (I) have likely read about them briefly and are coming to this software to try them out for the first time. Or at least, a good subset are...
Alexander Kuhnle
@AlexKuhnle
Makes sense. :+1:
MAYANK GULATI
@MakGulati
Is it possible to write tensorforce wrapper for openai gym type of custom environments?
I have been trying hard to get it work, if you can share example code that would be highly appreciated. Thanks a bunch 😊.
Alexander Kuhnle
@AlexKuhnle
Hi, yes that's possible, in fact it shouldn't be necessary to write any wrapper yourself if you implemented the Gym interface. If you pass your gym.Env class as the level argument to the Tensorforce gym environment, it should work.
If that's not quite what you want to do, can you specify what exactly you tried and didn't work?
Matt Pettis
@mpettis
I put a dead simple example of a room temperature that responds (with thermal resistance) to a heater at the following link. It has the formula for simple exponential resistance, and a plot of what the room temperature looks like when the heater is on vs. off: https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostatic-response-function.ipynb
Matt Pettis
@mpettis
Another place a tutorial example would be good... I'm defining an environment, and I want to, say, initialize my environment with a starting temperature. The sample isn't clear on how I pass initialization parameters to my environment creation. I also want to augment the class to have custom attributes in addition to what the base environment class has. But I can't see how to do that. Evironment.create() seems to return an EnvironmentWrapper class, and it doesn't have the custom attributes I set in the __init__() definition of Environment when I run a dir() on my created object.
MAYANK GULATI
@MakGulati
@AlexKuhnle hi, i believe level is for standard openai gym envs. What i meant was that i am following the same structure as openai gym but made my custom env having same abstract class (which you are check here). So my doubt is how to i make it work with tensorforce
MAYANK GULATI
@MakGulati
i guess you mean i should use this could you please write simply the syntax of the environment assuming the gym based custom env has name mmx
should it be environment =OpenAIGym(level=mmx)
Matt Pettis
@mpettis
OK, I made a stab at creating a bang-bang heater environment by extending the Environment class. I think I did it mostly correctly, and had to add things like incrementing a timestep. I think this (or a cleaned-up version with someone who know more) could be a good example of implementing an Environment from the base class. https://github.com/mpettis/shared-scratchpad/blob/master/thermostat-response/thermostat-environment.ipynb
Alexander Kuhnle
@AlexKuhnle
@mpettis I'll have a look at the issue and notebook. Would you be happy to add some more information to the docs in the process of sorting out the problem? That would be great. :-)
@MakGulati If you're using the same interface, is there a reason why you don't just use the gym.Env base class? Your environment class could be class MMX(gym.Env): ..., and if you then pass it to the Tensorforce Gym interface, it should be compatible: env = Environment.create(environment='gym', level=MMX, ...). Or have you tried this before? The level argument should certainly accept custom gym.Env subclass objects, and in fact also instances.
Alexander Kuhnle
@AlexKuhnle
@mpettis As mentioned in your issue, the only thing missing -- which is not clear right now from the exception, but will change -- is the environment.reset() before starting to execute. Apart from that you shouldn't need to add attributes or so when using Environment.create(...) (which, I'd say, is the preferred way of initializing an env). I will also add an attribute forwarding for the wrapper, however, it will be readonly which I think should be enough (environment logic should go into the env implementation itself).
It would also be great if you would be willing to contribute your little environment plus agent training script under examples/ :-)
(Plus appropriate acknowledgement, if you're happy to do so, of course)
MAYANK GULATI
@MakGulati
@AlexKuhnle after following this
i m getting

tensorforce.exception.TensorforceError: Unknown Gym space.

Alexander Kuhnle
@AlexKuhnle
Hmm, are you using the Gym objects to specify your state and action space?
Like CartPole, for instance, is specifying it here. If you do the same, can you post your observation/action space?
MAYANK GULATI
@MakGulati
i m doing two levels of inheritance
class Envir(gym.Env):
then class MMX(Envir) :
and for tensorforce env = Environment.create(environment='gym', level=MMX, ...)