by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Aug 13 15:27

    AlexKuhnle on master

    Remove internal optional value … Major change in policy/baseline… (compare)

  • Aug 03 12:35

    AlexKuhnle on master

    Correct Python version compatib… (compare)

  • Aug 03 12:06

    AlexKuhnle on master

    Fix requirements version proble… (compare)

  • Aug 03 11:21

    AlexKuhnle on master

    Improve signature and singleton… (compare)

  • Aug 01 15:37

    AlexKuhnle on master

    Align numpy version and TF requ… (compare)

  • Aug 01 12:19

    AlexKuhnle on master

    Improve quickstart example and … (compare)

  • Jul 29 08:35

    AlexKuhnle on master

    Fix rare numerical instability … (compare)

  • Jul 18 14:30

    AlexKuhnle on master

    Add more generic recorder optio… (compare)

  • Jul 15 19:17

    AlexKuhnle on master

    fix remaining failing unittests (compare)

  • Jul 14 21:59

    AlexKuhnle on master

    remove vizdoom from travis setup (compare)

  • Jul 14 21:53

    AlexKuhnle on master

    remove vizdoom from environment… (compare)

  • Jul 14 21:43

    AlexKuhnle on master

    allow arbitrary attribute inter… (compare)

  • Jul 13 22:38

    AlexKuhnle on master

    make linear_normalization defau… (compare)

  • Jul 11 20:43

    AlexKuhnle on master

    another fix for docs install pr… (compare)

  • Jul 11 19:26

    AlexKuhnle on master

    fix docs tf-addons install prob… (compare)

  • Jul 11 19:11

    AlexKuhnle on tf2

    (compare)

  • Jul 11 19:05

    AlexKuhnle on tf2

    added SimPyFab project to proje… added action values query option version 0.5.5, probably final c… and 9 more (compare)

  • Jul 11 18:57

    AlexKuhnle on tf2

    added record-and-pretrain examp… (compare)

  • Jul 06 22:26

    AlexKuhnle on tf2

    changed and improved tune.py, f… (compare)

  • Jul 05 17:13

    AlexKuhnle on tf2

    changed pg clipping for non-imp… improved Gaussian distr, bounde… (compare)

Alexander Kuhnle
@AlexKuhnle
Hi, you can retrieve additional tensors via the query argument -- 'action-distribution-values' (or alternatively your action name as first part) should work. Have a look here for an example in the unittests.
danthedolphin
@danthedolphin
@AlexKuhnle Thanks for the update! This module has helped me a lot
Alexander Kuhnle
@AlexKuhnle
No problem :-)
Qiao.Zhang
@qZhang88
@AlexKuhnle found this API TensorforceAgent.experience, is this for training with existing states, actions, rewards traces? How this the training process controlled? Should I better pass a batch of episode if I use PPO, so that for sure i know there would be one update?
Alexander Kuhnle
@AlexKuhnle
Hi, experience() stores a batch of timesteps into memory but doesn't trigger updates. There's also update() which triggers an update (but doesn't take any data). You can think of it that way: in the usual act() and observe() cycle, observe() is a combination of experience() (store timestep) and every now and then update() (depending on algo/config). Instead, you could basically use act(..., independent=True) (which is "independent" of the act-observe cycle, so doesn't register anything), and experience/update (however, it's not exactly equivalent, so it's not recommended to do). Potential use case: fill replay memory with experience at the beginning, or use it as a pretraining method. However, pretraining probably won't work well, as there are known problems with this simplistic approach. More use cases may come up e.g. with the implementation of better pretraining methods.
Pedro Chinen
@chinen93

Hi, I'm trying to use the runner.run() but I cannot understand what the progress bar is telling me. I tried to search in the documentation but couldn't find it.

"Episodes: 100%|█████ 400/400 [1:25:00, reward=-24.55, ts/ep=3294, sec/ep=14.10, ms/ts=4.3, agent=36.7%, comm=71.6%]"

I think it means, but i'm not sure:
ts/ep -> timestep per episode
sec/ep -> seconds per episode
ms/ts -> miliseconds per timestep
agent -> percentage of agent computation
comm -> percentage of environment computation

Can anyone tell me if I get it wrong?

Alexander Kuhnle
@AlexKuhnle
Almost correct. The last value, "comm" is the relative time spent on remote communication. However, I realised that the value currently wrongly shows the time spent on agent+env, so all but communication. I'll fix that and add more info to the use_tqdm argument. Plus, note that the numbers currently may not perfectly reflect "parallel performance" in a parallelized remote setup.
Pedro Chinen
@chinen93
thanks for the answer
Pedro Chinen
@chinen93

I have another question: I started my project creating a simple training - evaluation loop with times between 6~8sec per episode. After I changed to the Runner class the time jumped to ~14sec. Is there something that I can change to improve this performance?

I changed because I wanted to try using the parallel runner.

Alexander Kuhnle
@AlexKuhnle
Can you post both versions? There's a little overhead in the runner, but I wouldn't expect it to show...
杨子信
@yzx20160815_twitter
may i know what update_frequency mean in ppo agent?
Alexander Kuhnle
@AlexKuhnle
Hey, the update_frequency is the frequency of updates :-) Seriously: PPO does episode-based updates, so batch_size determines how many episodes are used for the update batch. The update_frequency, by default = batch_size, specifies how frequently an update should happen. So: batch_size=4, update_frequency=2 means after every second episode an update happens with a batch of 4 episodes. Note that update_frequency > batch_size doesn't make much sense, and that, technically, update_frequency < batch_size makes some of the batch data slightly off-policy. However, in practice that usually doesn't matter much.
杨子信
@yzx20160815_twitter
@AlexKuhnle thks
杨子信
@yzx20160815_twitter
@AlexKuhnle may i konw when Memory module free memory i use ppo agent and default memory, my training always stops because of oom
Alexander Kuhnle
@AlexKuhnle
Can you post your config? Are you running on GPU?
杨子信
@yzx20160815_twitter
yes ,on gpu
image.png
Alexander Kuhnle
@AlexKuhnle
Hmm, looks okay in principle. Depending on your input size, the input and the two RNNs may take quite some space, particularly considering that both the policy and the critic use this architecture. What if you, say, remove one RNN layer or reduce the sizes (even if you reduce them a lot, just to check whether it works at all)?
杨子信
@yzx20160815_twitter
@AlexKuhnle I'll try,thank you
Pedro Chinen
@chinen93
What are some ways to improve the train speed? I can only think about smaller network size (which might ruin the agent) and parallel execution. Is there any other?
Alexander Kuhnle
@AlexKuhnle
are you using the config above? the summarizer definitely is expensive, and maybe you can improve learning by increasing the optimisation steps (or other parameters), but other than that the parallelization you mention is a good option.
Tobias Oberrauch
@tobiasoberrauch
Hi guys. I want to use an exiting openai gym environment. can you show me an example of how to do that?
Alexander Kuhnle
@AlexKuhnle
Hey, this guide should help: https://tensorforce.readthedocs.io/en/latest/basics/getting-started.html Let me know if anything is unclear...
bennyfri
@bennyfri
Hi there, I'm trying to use custom environment and a PPO agent but when I run the runner, I get an error: ValueError: Cannot feed value of shape (1,) for Tensor 'agent/state-input:0', which has shape '(None, 64, 2)'. The share (64,2) is OK since this is the shape of my states, but I'm not sure why it tried to feed data with a scalar
Alexander Kuhnle
@AlexKuhnle
Hey, it sounds like this may be due to your custom environment implementation. Can you share it, or at least some relevant parts like state spec and example state value, etc...?
bennyfri
@bennyfri
Sure:

'''
import tensorflow as tf
from tensorforce import Agent, Environment, Runner
import numpy as np

Downloaded from http://portal.rafael.co.il/mlchallenge2019/Documents/Interceptor_V2.py

from Interceptor_V2 import Init, Draw, Game_step

MOVE_LEFT = 0
DO_NOTHING = 1
MOVE_RIGHT = 2
FIRE = 3

ROCKET_ARRAY_LENGTH = 40
INTERCEPTOR_ARRAY_LENGTH = 20
MIN_STEPS = 1000

class InterceptorEnvironment(Environment):
def init(self):
Init()
self.curr_score = 0
self.visualize = False
super().init()

def states(self):
    return dict(type='int', shape=(64,2), num_values=10000)

def actions(self):
    return dict(type='int', num_values=4)

# Optional, should only be defined if environment has a natural maximum
# episode length
def max_episode_timesteps(self):
    return super().max_episode_timesteps()

# Optional
def close(self):
    super().close()

def reset(self):
    Init()
    return DO_NOTHING

def make_size(self, a, size):
    if (len(a) > size):
        return np.split(a, [size])[0]
    return np.append(a, np.zeros([size - len(a), 2]), axis=0)

def execute(self, actions):
    r_locs, i_locs, c_locs, ang, score = Game_step(actions)
    rockets = self.make_size(r_locs, ROCKET_ARRAY_LENGTH)
    interceptors = self.make_size(i_locs, INTERCEPTOR_ARRAY_LENGTH)
    angles = np.zeros([2, 2])
    angles[1, 0] = ang
    next_state = np.concatenate((rockets, interceptors, c_locs, angles))
    terminal = False
    reward = score - self.curr_score
    self.curr_score = score
    if self.visualize:
        Draw()
    return next_state, terminal, reward

environment = Environment.create(environment=InterceptorEnvironment, max_episode_timesteps=MIN_STEPS)

agent = Agent.create(agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3)

runner = Runner(agent=agent, environment=environment)
environment.environment.visualize = True
runner.run(num_episodes=100)
runner.close()
'''

1 reply
Schade77
@Schade77
This message was deleted
Alexander Kuhnle
@AlexKuhnle
Okay, great :-)
IbraheemNofal
@IbraheemNofal
Hello there,
I'm interested in the functionality of the recorder for pre-training, but the docs don't provide enough explanation. In my case, I want to record the actions taken by a "benchmark-algorithm" and use that to pre-train the agent before it starts exploring. Is there a way to do this? I'm thinking maybe I could use an action-mask to mask out all actions except for the one taken by the "benchmark-algorithm", and then I'd use that to pre-train the agent. Does that sound like a reasonable approach?
Alexander Kuhnle
@AlexKuhnle
Hey, pretraining coincidentally came up in an issue very recently, see tensorforce/tensorforce#700. My last post from yesterday shows a small script which works for me, using the agent.pretrain function. I want to improve the docs and also implementation, in particular improve and extend the arguments of the function -- the basic pretrain approach is simple (kind of behavior cloning), but more sophisticated approaches won't be as straightforward to implement.
agent.pretrain() does the following: for num_iterations times, it loads num_traces trace files (which each contain recorder['frequency'] episodes), feeds them via agent.experience() to the agent's internal memory, and then triggers num_updates updates, which use experience sampled from the memory, so what was fed in this (and potentially previous) iterations.
Alexander Kuhnle
@AlexKuhnle
Currently, it's probably best to use it with "expert" demonstrations as recorded by a "perfect" agent (e.g. using recorder['start']). There are 1-2 small things I may add/rename/improve soon, plus better docs.
IbraheemNofal
@IbraheemNofal
Alright, I think I understand the pretraining aspect of it, but how do you exactly use recorder to record "expert" traces. Checking the example script, it seems like it automatically record traces of observations and actions.
I'm thinking if it automatically records, then perhaps I can get the "expert demonstrator" to take the action, which would be the only unmasked action when the agent observes the current state such that it "mirrors" the expert's actions and records those for use in pretraining. Thing is, for my use case, the "perfect" agent doesn't exist, but there is a benchmark algorithm that I'm using which as of now performs better than the best agents I've trained.
Alexander Kuhnle
@AlexKuhnle
Yes, the recorder only works if an agent is serving as the expert (the idea at the time of adding this feature was that you could bootstrap agents in cases where environment execution and consequently proper training from scratch is expensive).
But it's shouldn't be a problem to provide your own traces: the recorder uses np.savez_compressed(**experiences), where experiences are arrays of states, actions, rewards, terminals -- plus, the files need to be prefixed by trace- currently (this arbitrary requirement will likely be removed, and instead filter for the .npz extension, or so).
Alexander Kuhnle
@AlexKuhnle
There shouldn't be need to mask actions -- in fact, this will probably prevent learning (since masked out actions are ignored, so the agent has no other choice than to pick the only available action at each step).
If you want to discuss more how and why pretraining works, maybe best to do that in an issue, so that others can find it, too... Feel free to open one :-) Happy to help to get it to work, and improve the pretrain feature if there's something which currently is not working well / not well-documented. It would also be great if we could add a simple script to the examples folder.
IbraheemNofal
@IbraheemNofal
Not entirely sure if my use case would result in a usable script that can be copy pasted to the examples folder, as I'm currently using a custom environment and runner implementation to run multiple agents in one environment, but I've gone an opened an issue with the hope that it might be beneficial for others down the line. Here is the link tensorforce/tensorforce#708
Alexander Kuhnle
@AlexKuhnle
Thanks, and no worries if an example is difficult ;-)
charles sanders
@qorrect
I've got a newby question, I am working with this code https://gist.github.com/qorrect/7ae5b5ebfeaf98b908c08dc6f165d518 , which is loading and saving fine, but when I try to set visualize=True after I load the trained model I'm not getting any visual results, thoughts ?
charles sanders
@qorrect
Nevermind I see my error
Steven Tobias
@stobias123
Hi All - I've got a probably simple question...
I've written a custom environment based on OpenAI gym, but having problems loading it into a tensorforce agent. I get "Unknown Gym Space"
I was able to load it just fine into a simple random agent, but can't get it to load into tensorforce agent... Any tips?
Alexander Kuhnle
@AlexKuhnle
Hey, it sounds like you're using an unknown structure when specifying the state/action space. Can you post these definitions? They should consist of objects from gym.spaces.XXX (but it might be that Tensorforce doesn't cover all options).
Pedro Chinen
@chinen93

Hi, I'm getting the error message when running with parallel envs:

File "~/.virtualenv/env/lib/python3.6/site-packages/tensorforce/environments/environment.py", line 500, in receive
raise TensorforceError(message='{}: {}'.format(etype, value)).with_traceback(traceback)
TypeError: traceback must be a traceback or None

I think, It might be something in my env, however I can't see the traceback..... Is there something I can do to see actual traceback?

Pedro Chinen
@chinen93
Yeah, it was a a problem in my env. I found it. :)
Alexander Kuhnle
@AlexKuhnle
Good :-)