by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 23 19:05

    AlexKuhnle on master

    Improve some exception messages… Merge branch 'master' of github… (compare)

  • Sep 21 08:19

    AlexKuhnle on master

    Fix TensorforceAgent docstring … (compare)

  • Sep 19 11:16

    AlexKuhnle on 0.6.1

    (compare)

  • Sep 19 11:14

    AlexKuhnle on master

    Version 0.6.1 (compare)

  • Sep 10 20:22

    AlexKuhnle on master

    Various improvements and additi… (compare)

  • Aug 30 16:48

    AlexKuhnle on 0.6.0

    (compare)

  • Aug 30 16:45

    AlexKuhnle on master

    Fix version 0.6.0, clean setup … (compare)

  • Aug 30 14:34

    AlexKuhnle on master

    Improve SavedModel saving forma… (compare)

  • Aug 26 16:28

    AlexKuhnle on master

    Various improvements and change… (compare)

  • Aug 23 21:10

    AlexKuhnle on master

    Update last commit: add act-exp… (compare)

  • Aug 23 21:09

    AlexKuhnle on master

    Add act-experience-update examp… (compare)

  • Aug 23 18:07

    AlexKuhnle on master

    Improve Tensorboard summaries, … (compare)

  • Aug 22 13:54

    AlexKuhnle on master

    Fix and improve DPG agent, more… Missing in last commit: add new… (compare)

  • Aug 17 20:50

    AlexKuhnle on master

    Add deterministic argument for … (compare)

  • Aug 14 12:59

    AlexKuhnle on master

    Apply sampling for stochastic p… (compare)

  • Aug 13 15:27

    AlexKuhnle on master

    Remove internal optional value … Major change in policy/baseline… (compare)

  • Aug 03 12:35

    AlexKuhnle on master

    Correct Python version compatib… (compare)

  • Aug 03 12:06

    AlexKuhnle on master

    Fix requirements version proble… (compare)

  • Aug 03 11:21

    AlexKuhnle on master

    Improve signature and singleton… (compare)

  • Aug 01 15:37

    AlexKuhnle on master

    Align numpy version and TF requ… (compare)

Steven Tobias
@stobias123
getting this error when running training
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [2] using a Tensor with shape [4], shapes must be equal.
Steven Tobias
@stobias123
found the problem. I was trying to load old saved checkpoints/tensorboard data. just had to clear that
Matt Pettis
@mpettis

Hey @mpettis, I added some documentation for multi-input networks here (and multi-state/action specification here). All very minimal, but a start... :-)

Hi @AlexKuhnle , looking closer at the multi-input documentation you updated here for my benefit... when reading, you state in the documentation you use the special layers Register and Retrieve, but in the example, it looks like you are only referencing retrieve... is that correct? Am I missing something?

1 reply
Matt Pettis
@mpettis
@AlexKuhnle Also, where can I find documentation on the 'states' and 'policy' argument in that example? I'm looking at the main definition, and best that I can tell, they may be passed on in **kwargs to the Agent.create() method? Looking here: https://tensorforce.readthedocs.io/en/latest/agents/agent.html#tensorforce.agents.TensorforceAgent.create
2 replies
Matt Pettis
@mpettis
For anyone... if I create my environment with Environment.create(), I get an EnvironmentWrapper object. In my execute() method of the Environment class I make, what is the best way to access the timestep attribute of the wrapper? I'm a bit rusty on my python, and I think I can tell that the wrapper holds a reference to my environment object as an attribute of the wrapper instance, and the timestep is another attribute of the wrapper instance. And I'm struggling with how to access that timestep value from my Environment class definition. I'm currently cheating and keeping track of my own timestep, but that's not optimal.
Matt Pettis
@mpettis
To be honest -- I don't think you can. Since an environmentWrapper "has a" envinroment (in particular, the one I create) as a member, it can't inspect its peer members within the wrapper, which it would need to, with the current architecture.
Alexander Kuhnle
@AlexKuhnle
Yes, you can't access the wrapper attributes from the internal environment. Not sure whether there is a better way. However, I wouldn't say it's cheating to keep track of it yourself: some environments need to do this, others don't, and the ones which do should explicitly do it in their implementation. The wrapper keeps track of it for other reasons (to obey to "max_episode_timesteps" if set).
So I would say, what you're doing is what is what is recommended to happen. Different question if you want to access the attribute externally, which is currently not well supported. If that would be good to support, happy to think how to make it happen, probably just something like environment.get_additional_info() or so.
yanghaoxie
@yanghaoxie
Hi, everyone.
yanghaoxie
@yanghaoxie

I have a problem about setting network_spec.
I set network_spec as network_spec = [{"type": "dense", "size": 100, "activation": "relu"}].
However, I get the error message TensorforceError: Invalid value for Module.add_variable argument shape: 0,100.
After some experiments, I found out that, if the states is nested dict this error occur, otherwise, it doesn't.
For example, following states definition will cause this error,

    def states(self):
        states = {}
        states['foo'] = dict(type='int', shape=(3, ), num_values=6)
        states['bar'] = dict(type='int', shape=(3, ), num_values=6)
        return states

and following definition will not cause the error,

    def states(self):
        return dict(type='float', shape=(8,))

Could you please help me?

Alexander Kuhnle
@AlexKuhnle
Hi @yanghaoxie, the exception message is not very informative, will need to check whether it can be improved (or may just be an artifact of at what point the inconsistency in specification causes actual problems). But here two points you should look into:
The first state consists of two components, so a simple sequential network will not work (Tensorforce doesn't implicitly concatenate inputs or something like that). What you can do in this case is to use the "extended" multi-input network specification feature, which plugs together sequential "components" and retrieves state components via special "register" and "retrieve" layers.
Moreover, the first state consists of integers, which cannot be processed by a dense layer (again, Tensorforce doesn't do anything implicitly to take care of it). The simple way to address this problem is to use an embedding layer first (see here), to map each of the finite values to a corresponding embedding (equivalent to encoding as one-hot vectors and then applying a dense layer).
Hope that helps!
yanghaoxie
@yanghaoxie
@AlexKuhnle Thank you so much for your help :). I will investigate what you told me.
danthedolphin
@danthedolphin
Hi all, I'm trying to replicate the DQN paper by Mnih et al (2015) on Atari games and am trying to extract the Q values for each action but I'm not sure how to get them. I've already got the agent and environment training and everything but this is the last step I need. Is there a way to somehow get the Q values for each action every time I call agent.act(states=states)?
Alexander Kuhnle
@AlexKuhnle
Hi, you can retrieve additional tensors via the query argument -- 'action-distribution-values' (or alternatively your action name as first part) should work. Have a look here for an example in the unittests.
danthedolphin
@danthedolphin
@AlexKuhnle Thanks for the update! This module has helped me a lot
Alexander Kuhnle
@AlexKuhnle
No problem :-)
Qiao.Zhang
@qZhang88
@AlexKuhnle found this API TensorforceAgent.experience, is this for training with existing states, actions, rewards traces? How this the training process controlled? Should I better pass a batch of episode if I use PPO, so that for sure i know there would be one update?
Alexander Kuhnle
@AlexKuhnle
Hi, experience() stores a batch of timesteps into memory but doesn't trigger updates. There's also update() which triggers an update (but doesn't take any data). You can think of it that way: in the usual act() and observe() cycle, observe() is a combination of experience() (store timestep) and every now and then update() (depending on algo/config). Instead, you could basically use act(..., independent=True) (which is "independent" of the act-observe cycle, so doesn't register anything), and experience/update (however, it's not exactly equivalent, so it's not recommended to do). Potential use case: fill replay memory with experience at the beginning, or use it as a pretraining method. However, pretraining probably won't work well, as there are known problems with this simplistic approach. More use cases may come up e.g. with the implementation of better pretraining methods.
Pedro Chinen
@chinen93

Hi, I'm trying to use the runner.run() but I cannot understand what the progress bar is telling me. I tried to search in the documentation but couldn't find it.

"Episodes: 100%|█████ 400/400 [1:25:00, reward=-24.55, ts/ep=3294, sec/ep=14.10, ms/ts=4.3, agent=36.7%, comm=71.6%]"

I think it means, but i'm not sure:
ts/ep -> timestep per episode
sec/ep -> seconds per episode
ms/ts -> miliseconds per timestep
agent -> percentage of agent computation
comm -> percentage of environment computation

Can anyone tell me if I get it wrong?

Alexander Kuhnle
@AlexKuhnle
Almost correct. The last value, "comm" is the relative time spent on remote communication. However, I realised that the value currently wrongly shows the time spent on agent+env, so all but communication. I'll fix that and add more info to the use_tqdm argument. Plus, note that the numbers currently may not perfectly reflect "parallel performance" in a parallelized remote setup.
Pedro Chinen
@chinen93
thanks for the answer
Pedro Chinen
@chinen93

I have another question: I started my project creating a simple training - evaluation loop with times between 6~8sec per episode. After I changed to the Runner class the time jumped to ~14sec. Is there something that I can change to improve this performance?

I changed because I wanted to try using the parallel runner.

Alexander Kuhnle
@AlexKuhnle
Can you post both versions? There's a little overhead in the runner, but I wouldn't expect it to show...
杨子信
@yzx20160815_twitter
may i know what update_frequency mean in ppo agent?
Alexander Kuhnle
@AlexKuhnle
Hey, the update_frequency is the frequency of updates :-) Seriously: PPO does episode-based updates, so batch_size determines how many episodes are used for the update batch. The update_frequency, by default = batch_size, specifies how frequently an update should happen. So: batch_size=4, update_frequency=2 means after every second episode an update happens with a batch of 4 episodes. Note that update_frequency > batch_size doesn't make much sense, and that, technically, update_frequency < batch_size makes some of the batch data slightly off-policy. However, in practice that usually doesn't matter much.
杨子信
@yzx20160815_twitter
@AlexKuhnle thks
杨子信
@yzx20160815_twitter
@AlexKuhnle may i konw when Memory module free memory i use ppo agent and default memory, my training always stops because of oom
Alexander Kuhnle
@AlexKuhnle
Can you post your config? Are you running on GPU?
杨子信
@yzx20160815_twitter
yes ,on gpu
image.png
Alexander Kuhnle
@AlexKuhnle
Hmm, looks okay in principle. Depending on your input size, the input and the two RNNs may take quite some space, particularly considering that both the policy and the critic use this architecture. What if you, say, remove one RNN layer or reduce the sizes (even if you reduce them a lot, just to check whether it works at all)?
杨子信
@yzx20160815_twitter
@AlexKuhnle I'll try,thank you
Pedro Chinen
@chinen93
What are some ways to improve the train speed? I can only think about smaller network size (which might ruin the agent) and parallel execution. Is there any other?
Alexander Kuhnle
@AlexKuhnle
are you using the config above? the summarizer definitely is expensive, and maybe you can improve learning by increasing the optimisation steps (or other parameters), but other than that the parallelization you mention is a good option.
Tobias Oberrauch
@tobiasoberrauch
Hi guys. I want to use an exiting openai gym environment. can you show me an example of how to do that?
Alexander Kuhnle
@AlexKuhnle
Hey, this guide should help: https://tensorforce.readthedocs.io/en/latest/basics/getting-started.html Let me know if anything is unclear...
bennyfri
@bennyfri
Hi there, I'm trying to use custom environment and a PPO agent but when I run the runner, I get an error: ValueError: Cannot feed value of shape (1,) for Tensor 'agent/state-input:0', which has shape '(None, 64, 2)'. The share (64,2) is OK since this is the shape of my states, but I'm not sure why it tried to feed data with a scalar
Alexander Kuhnle
@AlexKuhnle
Hey, it sounds like this may be due to your custom environment implementation. Can you share it, or at least some relevant parts like state spec and example state value, etc...?
bennyfri
@bennyfri
Sure:

'''
import tensorflow as tf
from tensorforce import Agent, Environment, Runner
import numpy as np

Downloaded from http://portal.rafael.co.il/mlchallenge2019/Documents/Interceptor_V2.py

from Interceptor_V2 import Init, Draw, Game_step

MOVE_LEFT = 0
DO_NOTHING = 1
MOVE_RIGHT = 2
FIRE = 3

ROCKET_ARRAY_LENGTH = 40
INTERCEPTOR_ARRAY_LENGTH = 20
MIN_STEPS = 1000

class InterceptorEnvironment(Environment):
def init(self):
Init()
self.curr_score = 0
self.visualize = False
super().init()

def states(self):
    return dict(type='int', shape=(64,2), num_values=10000)

def actions(self):
    return dict(type='int', num_values=4)

# Optional, should only be defined if environment has a natural maximum
# episode length
def max_episode_timesteps(self):
    return super().max_episode_timesteps()

# Optional
def close(self):
    super().close()

def reset(self):
    Init()
    return DO_NOTHING

def make_size(self, a, size):
    if (len(a) > size):
        return np.split(a, [size])[0]
    return np.append(a, np.zeros([size - len(a), 2]), axis=0)

def execute(self, actions):
    r_locs, i_locs, c_locs, ang, score = Game_step(actions)
    rockets = self.make_size(r_locs, ROCKET_ARRAY_LENGTH)
    interceptors = self.make_size(i_locs, INTERCEPTOR_ARRAY_LENGTH)
    angles = np.zeros([2, 2])
    angles[1, 0] = ang
    next_state = np.concatenate((rockets, interceptors, c_locs, angles))
    terminal = False
    reward = score - self.curr_score
    self.curr_score = score
    if self.visualize:
        Draw()
    return next_state, terminal, reward

environment = Environment.create(environment=InterceptorEnvironment, max_episode_timesteps=MIN_STEPS)

agent = Agent.create(agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3)

runner = Runner(agent=agent, environment=environment)
environment.environment.visualize = True
runner.run(num_episodes=100)
runner.close()
'''

1 reply
Schade77
@Schade77
This message was deleted
Alexander Kuhnle
@AlexKuhnle
Okay, great :-)
IbraheemNofal
@IbraheemNofal
Hello there,
I'm interested in the functionality of the recorder for pre-training, but the docs don't provide enough explanation. In my case, I want to record the actions taken by a "benchmark-algorithm" and use that to pre-train the agent before it starts exploring. Is there a way to do this? I'm thinking maybe I could use an action-mask to mask out all actions except for the one taken by the "benchmark-algorithm", and then I'd use that to pre-train the agent. Does that sound like a reasonable approach?
Alexander Kuhnle
@AlexKuhnle
Hey, pretraining coincidentally came up in an issue very recently, see tensorforce/tensorforce#700. My last post from yesterday shows a small script which works for me, using the agent.pretrain function. I want to improve the docs and also implementation, in particular improve and extend the arguments of the function -- the basic pretrain approach is simple (kind of behavior cloning), but more sophisticated approaches won't be as straightforward to implement.
agent.pretrain() does the following: for num_iterations times, it loads num_traces trace files (which each contain recorder['frequency'] episodes), feeds them via agent.experience() to the agent's internal memory, and then triggers num_updates updates, which use experience sampled from the memory, so what was fed in this (and potentially previous) iterations.
Alexander Kuhnle
@AlexKuhnle
Currently, it's probably best to use it with "expert" demonstrations as recorded by a "perfect" agent (e.g. using recorder['start']). There are 1-2 small things I may add/rename/improve soon, plus better docs.
IbraheemNofal
@IbraheemNofal
Alright, I think I understand the pretraining aspect of it, but how do you exactly use recorder to record "expert" traces. Checking the example script, it seems like it automatically record traces of observations and actions.