Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 19 20:54

    AlexKuhnle on ppo-revert

    Fix problem with NestedDict.pop Fix/improve an exception messag… (compare)

  • Oct 18 21:13

    AlexKuhnle on ppo-revert

    Change back optimizer and polic… (compare)

  • Oct 17 11:07

    AlexKuhnle on ppo-revert

    Experimental: revert some chang… (compare)

  • Oct 17 10:49

    AlexKuhnle on master

    Handle module nestings more con… (compare)

  • Oct 04 11:39

    AlexKuhnle on master

    Improve some exception messages… (compare)

  • Oct 03 16:00

    AlexKuhnle on 0.6.2

    (compare)

  • Oct 03 15:59

    AlexKuhnle on master

    Update to version 0.6.2 (compare)

  • Oct 03 12:06

    AlexKuhnle on master

    Remove a reward vs baseline hor… (compare)

  • Oct 01 14:39

    AlexKuhnle on master

    Add separate exponential decay … (compare)

  • Oct 01 13:30

    AlexKuhnle on master

    Fix critical bug for DQN varian… (compare)

  • Sep 30 19:19

    AlexKuhnle on master

    Use initial internals as defaul… (compare)

  • Sep 27 15:57

    dependabot[bot] on pip

    (compare)

  • Sep 27 15:56

    AlexKuhnle on master

    Bump tensorflow from 2.3.0 to 2… Merge pull request #735 from te… (compare)

  • Sep 25 19:30

    dependabot[bot] on pip

    Bump tensorflow from 2.3.0 to 2… (compare)

  • Sep 23 19:05

    AlexKuhnle on master

    Improve some exception messages… Merge branch 'master' of github… (compare)

  • Sep 21 08:19

    AlexKuhnle on master

    Fix TensorforceAgent docstring … (compare)

  • Sep 19 11:16

    AlexKuhnle on 0.6.1

    (compare)

  • Sep 19 11:14

    AlexKuhnle on master

    Version 0.6.1 (compare)

  • Sep 10 20:22

    AlexKuhnle on master

    Various improvements and additi… (compare)

  • Aug 30 16:48

    AlexKuhnle on 0.6.0

    (compare)

Marius
@MariusHolm
@AlexKuhnle Thank you. That definitely helps :)
杨子信
@yzx20160815_twitter
why updated = self.model_observe(parallel=parallel, **kwargs) always return False
@AlexKuhnle
Alexander Kuhnle
@AlexKuhnle
Ideally, it shouldn't always return False, but True if an update was performed (Which may be very infrequently). Could you check whether it really never returns True? What is your agent config?
杨子信
@yzx20160815_twitter
image.png
@AlexKuhnle
杨子信
@yzx20160815_twitter
sometimes return True
Qiao.Zhang
@qZhang88
@AlexKuhnle Does tensorforce support distributed training? That is several sampling machine and one parameter server. PS does gradient updating and other machine pull the most recent parameters and only run on exploration and sampling mode?
Alexander Kuhnle
@AlexKuhnle
@yzx20160815_twitter Does that mean the problem is solved and it does sometimes return True?
Alexander Kuhnle
@AlexKuhnle
@qZhang88 Yes and no. The parallelization mode which Tensorforce currently supports is based on one agent with multiple parallel input streams which interact with "remote" environments (via Python's multiprocessing or socket), instead of multiple remote worker agents and one central update agent.
Alexander Kuhnle
@AlexKuhnle
So the result is kind of the same, but the communication content is somewhat different. We've been using this approach in the context of computationally expensive simulations and it worked very well (see here, particularly diagrams on page 6).
杨子信
@yzx20160815_twitter
@AlexKuhnle yes ,thks
Qiao.Zhang
@qZhang88

@qZhang88 Yes and no. The parallelization mode which Tensorforce currently supports is based on one agent with multiple parallel input streams which interact with "remote" environments (via Python's multiprocessing or socket), instead of multiple remote worker agents and one central update agent.

what is the sync mechanism of tensorforce for multiprocessing or socket?

Alexander Kuhnle
@AlexKuhnle
There is the option to batch calls to the agent or to do them less/unbatched as they're requested. depending on the speed of the env, one or the other may be better. Is that what you mean?
Steven Tobias
@stobias123
Anyone around that can help a newbie w/ using a custom openAI gym environment?
I think my actions/state aren't being passed properly from openai gym to tensorflow environment and I'm not sure why
getting this error when running training
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [2] using a Tensor with shape [4], shapes must be equal.
Steven Tobias
@stobias123
found the problem. I was trying to load old saved checkpoints/tensorboard data. just had to clear that
Matt Pettis
@mpettis

Hey @mpettis, I added some documentation for multi-input networks here (and multi-state/action specification here). All very minimal, but a start... :-)

Hi @AlexKuhnle , looking closer at the multi-input documentation you updated here for my benefit... when reading, you state in the documentation you use the special layers Register and Retrieve, but in the example, it looks like you are only referencing retrieve... is that correct? Am I missing something?

1 reply
Matt Pettis
@mpettis
@AlexKuhnle Also, where can I find documentation on the 'states' and 'policy' argument in that example? I'm looking at the main definition, and best that I can tell, they may be passed on in **kwargs to the Agent.create() method? Looking here: https://tensorforce.readthedocs.io/en/latest/agents/agent.html#tensorforce.agents.TensorforceAgent.create
2 replies
Matt Pettis
@mpettis
For anyone... if I create my environment with Environment.create(), I get an EnvironmentWrapper object. In my execute() method of the Environment class I make, what is the best way to access the timestep attribute of the wrapper? I'm a bit rusty on my python, and I think I can tell that the wrapper holds a reference to my environment object as an attribute of the wrapper instance, and the timestep is another attribute of the wrapper instance. And I'm struggling with how to access that timestep value from my Environment class definition. I'm currently cheating and keeping track of my own timestep, but that's not optimal.
Matt Pettis
@mpettis
To be honest -- I don't think you can. Since an environmentWrapper "has a" envinroment (in particular, the one I create) as a member, it can't inspect its peer members within the wrapper, which it would need to, with the current architecture.
Alexander Kuhnle
@AlexKuhnle
Yes, you can't access the wrapper attributes from the internal environment. Not sure whether there is a better way. However, I wouldn't say it's cheating to keep track of it yourself: some environments need to do this, others don't, and the ones which do should explicitly do it in their implementation. The wrapper keeps track of it for other reasons (to obey to "max_episode_timesteps" if set).
So I would say, what you're doing is what is what is recommended to happen. Different question if you want to access the attribute externally, which is currently not well supported. If that would be good to support, happy to think how to make it happen, probably just something like environment.get_additional_info() or so.
yanghaoxie
@yanghaoxie
Hi, everyone.
yanghaoxie
@yanghaoxie

I have a problem about setting network_spec.
I set network_spec as network_spec = [{"type": "dense", "size": 100, "activation": "relu"}].
However, I get the error message TensorforceError: Invalid value for Module.add_variable argument shape: 0,100.
After some experiments, I found out that, if the states is nested dict this error occur, otherwise, it doesn't.
For example, following states definition will cause this error,

    def states(self):
        states = {}
        states['foo'] = dict(type='int', shape=(3, ), num_values=6)
        states['bar'] = dict(type='int', shape=(3, ), num_values=6)
        return states

and following definition will not cause the error,

    def states(self):
        return dict(type='float', shape=(8,))

Could you please help me?

Alexander Kuhnle
@AlexKuhnle
Hi @yanghaoxie, the exception message is not very informative, will need to check whether it can be improved (or may just be an artifact of at what point the inconsistency in specification causes actual problems). But here two points you should look into:
The first state consists of two components, so a simple sequential network will not work (Tensorforce doesn't implicitly concatenate inputs or something like that). What you can do in this case is to use the "extended" multi-input network specification feature, which plugs together sequential "components" and retrieves state components via special "register" and "retrieve" layers.
Moreover, the first state consists of integers, which cannot be processed by a dense layer (again, Tensorforce doesn't do anything implicitly to take care of it). The simple way to address this problem is to use an embedding layer first (see here), to map each of the finite values to a corresponding embedding (equivalent to encoding as one-hot vectors and then applying a dense layer).
Hope that helps!
yanghaoxie
@yanghaoxie
@AlexKuhnle Thank you so much for your help :). I will investigate what you told me.
danthedolphin
@danthedolphin
Hi all, I'm trying to replicate the DQN paper by Mnih et al (2015) on Atari games and am trying to extract the Q values for each action but I'm not sure how to get them. I've already got the agent and environment training and everything but this is the last step I need. Is there a way to somehow get the Q values for each action every time I call agent.act(states=states)?
Alexander Kuhnle
@AlexKuhnle
Hi, you can retrieve additional tensors via the query argument -- 'action-distribution-values' (or alternatively your action name as first part) should work. Have a look here for an example in the unittests.
danthedolphin
@danthedolphin
@AlexKuhnle Thanks for the update! This module has helped me a lot
Alexander Kuhnle
@AlexKuhnle
No problem :-)
Qiao.Zhang
@qZhang88
@AlexKuhnle found this API TensorforceAgent.experience, is this for training with existing states, actions, rewards traces? How this the training process controlled? Should I better pass a batch of episode if I use PPO, so that for sure i know there would be one update?
Alexander Kuhnle
@AlexKuhnle
Hi, experience() stores a batch of timesteps into memory but doesn't trigger updates. There's also update() which triggers an update (but doesn't take any data). You can think of it that way: in the usual act() and observe() cycle, observe() is a combination of experience() (store timestep) and every now and then update() (depending on algo/config). Instead, you could basically use act(..., independent=True) (which is "independent" of the act-observe cycle, so doesn't register anything), and experience/update (however, it's not exactly equivalent, so it's not recommended to do). Potential use case: fill replay memory with experience at the beginning, or use it as a pretraining method. However, pretraining probably won't work well, as there are known problems with this simplistic approach. More use cases may come up e.g. with the implementation of better pretraining methods.
Pedro Chinen
@chinen93

Hi, I'm trying to use the runner.run() but I cannot understand what the progress bar is telling me. I tried to search in the documentation but couldn't find it.

"Episodes: 100%|█████ 400/400 [1:25:00, reward=-24.55, ts/ep=3294, sec/ep=14.10, ms/ts=4.3, agent=36.7%, comm=71.6%]"

I think it means, but i'm not sure:
ts/ep -> timestep per episode
sec/ep -> seconds per episode
ms/ts -> miliseconds per timestep
agent -> percentage of agent computation
comm -> percentage of environment computation

Can anyone tell me if I get it wrong?

Alexander Kuhnle
@AlexKuhnle
Almost correct. The last value, "comm" is the relative time spent on remote communication. However, I realised that the value currently wrongly shows the time spent on agent+env, so all but communication. I'll fix that and add more info to the use_tqdm argument. Plus, note that the numbers currently may not perfectly reflect "parallel performance" in a parallelized remote setup.
Pedro Chinen
@chinen93
thanks for the answer
Pedro Chinen
@chinen93

I have another question: I started my project creating a simple training - evaluation loop with times between 6~8sec per episode. After I changed to the Runner class the time jumped to ~14sec. Is there something that I can change to improve this performance?

I changed because I wanted to try using the parallel runner.

Alexander Kuhnle
@AlexKuhnle
Can you post both versions? There's a little overhead in the runner, but I wouldn't expect it to show...
杨子信
@yzx20160815_twitter
may i know what update_frequency mean in ppo agent?
Alexander Kuhnle
@AlexKuhnle
Hey, the update_frequency is the frequency of updates :-) Seriously: PPO does episode-based updates, so batch_size determines how many episodes are used for the update batch. The update_frequency, by default = batch_size, specifies how frequently an update should happen. So: batch_size=4, update_frequency=2 means after every second episode an update happens with a batch of 4 episodes. Note that update_frequency > batch_size doesn't make much sense, and that, technically, update_frequency < batch_size makes some of the batch data slightly off-policy. However, in practice that usually doesn't matter much.
杨子信
@yzx20160815_twitter
@AlexKuhnle thks
杨子信
@yzx20160815_twitter
@AlexKuhnle may i konw when Memory module free memory i use ppo agent and default memory, my training always stops because of oom
Alexander Kuhnle
@AlexKuhnle
Can you post your config? Are you running on GPU?
杨子信
@yzx20160815_twitter
yes ,on gpu
image.png
Alexander Kuhnle
@AlexKuhnle
Hmm, looks okay in principle. Depending on your input size, the input and the two RNNs may take quite some space, particularly considering that both the policy and the critic use this architecture. What if you, say, remove one RNN layer or reduce the sizes (even if you reduce them a lot, just to check whether it works at all)?