## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
• Jan 15 08:49

AlexKuhnle on master

Add not-maintained message to r… (compare)

• Nov 21 2022 20:42

dependabot[bot] on pip

• Nov 21 2022 20:42

dependabot[bot] on pip

Bump tensorflow from 2.8.0 to 2… (compare)

• Jul 29 2022 23:31

dependabot[bot] on pip

Bump mistune from 0.8.4 to 2.0.… (compare)

• May 24 2022 17:30

dependabot[bot] on pip

Bump tensorflow from 2.8.0 to 2… (compare)

• Feb 10 2022 08:43

dependabot[bot] on pip

• Feb 10 2022 08:43

AlexKuhnle on master

Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

• Feb 09 2022 23:35

dependabot[bot] on pip

• Feb 09 2022 23:35

dependabot[bot] on pip

Bump tensorflow from 2.7.0 to 2… (compare)

• Feb 09 2022 23:28

dependabot[bot] on pip

Bump tensorflow from 2.7.0 to 2… (compare)

• Jan 08 2022 21:53

AlexKuhnle on master

Correct type (compare)

• Jan 08 2022 21:41

AlexKuhnle on master

• Jan 08 2022 16:56

AlexKuhnle on master

Downgrade numpy version for Py3… (compare)

• Jan 08 2022 16:51

AlexKuhnle on master

Update to TF 2.7, update depend… (compare)

• Jan 03 2022 16:15

AlexKuhnle on master

Update setup and travis config (compare)

• Dec 29 2021 14:54

AlexKuhnle on master

make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

• Nov 10 2021 20:00

dependabot[bot] on pip

• Nov 10 2021 20:00

AlexKuhnle on master

Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

• Nov 10 2021 19:45

dependabot[bot] on pip

Bump tensorflow from 2.6.0 to 2… (compare)

• Oct 20 2021 20:50

AlexKuhnle on master

Update gym version requirement (compare)

Alexander Kuhnle
@AlexKuhnle
Hi @HYDesmondLiu, setting a reward horizon for PPO is "not possible", since PPO as policy gradient algorithm is episode-based, not n-step. That's at least the reason why this option is not offered as configuration for the PPO agent. However, one could, in principle, configure a "PPO-like variant" using a shorter estimation horizon. How: by replicating the PPO config using the more general Tensorforce agent (the parent of all agents in Tensorforce), and then modifying the corresponding argument. I was planning to add these configs for the users who want to modify agent types beyond their "intended domain", if that would help.
HYDesmondLiu
@HYDesmondLiu
Hi @AlexKuhnle , Thanks a lot for the response, sorry I did not figure it out clearly. Your solution looks good. (seems like I cannot reply in the thread?)
Alexander Kuhnle
@AlexKuhnle
@HYDesmondLiu , I've added the example config for PPO based on the more general Tensorforce agent in benchmarks/configs/ppo_tensorforce.json. It is equivalent to the config ppo.json.
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks a lot will try it out.
May I know what is the easiest way to record reward vs. episode?
I cannot tee the progress sometimes that is output from TensorForce.
Alexander Kuhnle
@AlexKuhnle
Have you tried the TensorBoard summaries?
Benno Geißelmann
@GANdalf2357

Hi @GANdalf2357 , I think this issue was recently resolved, so updating to the latest Github master should help (or if you're using the pip version, changing to the Github version -- of course, sooner or later there will be a new pip version, maybe should do that soon).

Hi @AlexKuhnle I now switched to master to overcome this issue but now the reset() call failes, do have I have to change something in my state with the new master? I now get this error in reset() tensorforce.exception.TensorforceError: Environment.reset: invalid type <class 'tuple'> != float for state.

till now I was on 0.6.2 which worked fine with my code
Benno Geißelmann
@GANdalf2357
my state is a tuple about like this (0,0,0,0,0.34243,0.5424,0.4211)
on a first look: if i remove the check which leads to this error the training seems to run fine again.
HYDesmondLiu
@HYDesmondLiu

Have you tried the TensorBoard summaries?

@AlexKuhnle Thanks very much, will try !

HYDesmondLiu
@HYDesmondLiu
Does the "update_frequency" mean how frequently the reward is updated? Like "temporal difference update"?
HYDesmondLiu
@HYDesmondLiu
For DQN agent setup, how should we set the 'memory'?
Alexander Kuhnle
@AlexKuhnle
@GANdalf2357 Yes, I think I've recently added some tests to catch "invalid" inputs, which otherwise might later lead to obscure errors. It's not covering this case properly, but if you try it with the latest commit in a few min, hopefully the problem should be gone.
@HYDesmondLiu update_frequency is not about the reward, but about the optimization step. So for each update you have a certain batch-size, say 10 episodes, or 64 timesteps, and unless specified differently, that will also be the update_frequency. However, you may want to update more frequently than that, so you may set update_frequency to something between 1 and batch_size (> batch_size doesn't really make sense).
Alexander Kuhnle
@AlexKuhnle
Regarding DQN agent and memory, it's typically a number >> batch_size, at least 100x or much more. Moreover, it's typical particularly here to set the update_frequency lower than batch_size, say 8 vs 64. But it's a parameter that can be tuned, at least to get the rough magnitude right.
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks so much.
I met another problem while running DQN, where the action space should be discrete so the actions should be int types and min_value, max_value, and num_value should be specified. However if I set these three I got an error File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/utils/tensor_spec.py", line 45, in __init__ name='TensorSpec', argument='min/max_value', condition='num_values specified' tensorforce.exception.TensorforceError: Invalid TensorSpec argument min/max_value given num_values specified. As I trace the source code it seems like these three cannot be all set at the same time?
For example one of my action is $x_1 \in [58,80]$ and the other is $x_2 \in [0, 1500]$ should I set the num_values for $x_1$ as $80-58+1 = 23$ and $x_2$ as $1500-0=1500$?
What I do not understand is since we have already set min_value and max_value, why not let the code to calculate the num_value itself?
Alexander Kuhnle
@AlexKuhnle
Hmm, yes, it could -- the idea is that min_/max_value are for float types as lower and upper bound, whereas num_values is for int types. The fact that it also specifies min_/max_value implicitly shouldn't matter, so you shouldn't need to specify all three, just the two or one, depending on the type. (The idea to still add it is, I think, they internally share some asserts via min/max-value, plus there could potentially be layer types which want to know min/max-bounds... but it really doesn't matter right now.)
So in short: if I understand correctly, you set all three values -- so the solution should be to only set num_values in case of int.
I will change the exception you pointed out so that it will always complain about the right thing being specified invalidly.
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks for prompt reply. But if I only set num_values how do agents know what the max_value and min_value are? I don't want to mess up my system.
Alexander Kuhnle
@AlexKuhnle
They will always be zero-based, so taking your example of num_values=23 produces values 0, ..., 22. Your environment can then just add the offset to actions, e.g. 58, or subtract the offset from states.
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks a lot, it works~!
HYDesmondLiu
@HYDesmondLiu
Hi @AlexKuhnle it's me again. I just tried to set only the num_values without setting max_value and min_value and it worked,
however I got this error and I am not sure how this happens debugging it. Could you please give me some hints?
 Traceback (most recent call last):
File "reinforcement_learning_continuous.py", line 221, in <module>
save_best_agent=True
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 548, in run
self.handle_observe(parallel=n)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 659, in handle_observe
terminal=self.terminals[parallel], reward=self.rewards[parallel], parallel=parallel
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 511, in observe
terminal=terminal_tensor, reward=reward_tensor, parallel=parallel_tensor
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/module.py", line 128, in decorated
output_args = function_graphs[str(graph_params)](*graph_args)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 784, in __call__
result = self._call(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 818, in _call
results = self._stateful_fn(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2972, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1948, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in call
ctx=ctx)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
**tensorflow.python.framework.errors_impl.InvalidArgumentError:  Invalid gradient: contains inf or nan. : Tensor had NaN values**
[[{{node agent/StatefulPartitionedCall/agent/cond_1/then/_311/agent/cond_1/StatefulPartitionedCall/agent/StatefulPartitionedCall_7/policy_optimizer/StatefulPartitionedCall/policy_optimizer/VerifyFinite/CheckNumerics}}]] [Op:__inference_observe_5452]

Function call stack:
observe
Alexander Kuhnle
@AlexKuhnle
Phew, that could have many reasons. While there are a few assertions to catch inf/nan inputs, it's hard to say what causes inf/nan gradients. I'm a bit confused by where this exception is thrown, since there is a gradient inf/nan check as well. Can you post the agent config? How quickly does this come up? I assume it trains okay for a while, before throwing this exception? Can you check whether the agent is always choosing the same action, or whether there's variation?
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Sorry for the late reply, it does not happen now.
HYDesmondLiu
@HYDesmondLiu
Hi @AlexKuhnle I think I am a bit confused. What are the algorithms in Tensorforce that are "model-based"?
Alexander Kuhnle
@AlexKuhnle
Hi, there are no model-based algorithms in Tensorforce, and probably won't be for the foreseeable future, the framework focuses on the typical model-free algorithm classes, in particular Q-learning and policy gradient.
R Puttkammer
@rpgit12
@AlexKuhnle - I'm using PPO for a AI gym like problem (network=auto, batch_size=100, state=float(77), action=int(4)) and am surprised to see exploration default at 0. Isn't exploration>0 required during training? And shouldn't it be reset to 0 afterwards? Couldn't find specifics in docs or sample code. Thanks!
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks a lot. That makes sense.
Alexander Kuhnle
@AlexKuhnle
@rpgit12 Exploration is only typical for deterministic policies like DQN or DPG. other policy gradient algorithms sample an action from the policy distribution, so they have exploration kind of built in. Moreover, I would say it depends to some degree on the randomness of your environment, whether the agent needs to be encouraged to explore or will be "forced to explore" due to a random environment.
(the latter point is secondary, though)
HYDesmondLiu
@HYDesmondLiu

@AlexKuhnle I got this TypeError TypeError: __init__() got multiple values for keyword argument 'optimizer' while setting up A2C like this:

     elif args.agent=="a2c":
agent = Agent.create(
agent='a2c',
environment=environment,
max_episode_timesteps=5,
batch_size=32,
network=[
dict(type='dense', size=128, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
optimizer=dict(
multi_step=10, subsampling_fraction=64, linesearch_iterations=5,
doublecheck_update=True
),
critic=[
dict(type='dense', size=64, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
)

However this is what described we are supposed to be setting optimizer in the 'optimizer' section.

Alexander Kuhnle
@AlexKuhnle
Currently, the A2C agent only has an argument learning_rate, whereas the rest of optimizer is implicitly specified as ADAM. You would need to move to the tensorforce agent to have all arguments available. The idea behind the various agent sub-classes is to provide a "standard" interface with only the typical arguments (and things like subsampling_fraction, linesearch are not typical for A2C), however, maybe I'll change this at some point.
HYDesmondLiu
@HYDesmondLiu
@AlexKuhnle Thanks, this is very helpful.
R Puttkammer
@rpgit12
@AlexKuhnle - thanks indeed!
HYDesmondLiu
@HYDesmondLiu

Hi @AlexKuhnle , to your previous question regarding DQN gradient assertion, here are my responses:
To your questions, here are the reponses:
Agent config.:

 agent = Agent.create(
agent="dqn",
environment=environment,
memory=300,
batch_size=32,
network="auto",
update_frequency=1,
learning_rate=1e-5,
discount=0.9
)

2.&3. Very quick, running on the first episode.
4.&5. There is only one action since it happened on the first episode.

amirrezaheidari
@amirrezaheidari

Hi
I need to train my agent for many iterations. My agent interacts with TRNSYS as environment and calls it at each action to calculate the next state. Therefore, it takes a long time to train it (8 hours). I need to reduce this time.

1- Do you think if I perform parallel computing I will get fast enough operation?

2- I tried the following way to test if I can do parallel computing but I get the following error:

###### #

Runner(
agent=r'C:\Python_TRNSYS_integration\Parallelization test\agent.json', environment=HotWaterEnvironment,
num_parallel=4
)
runner.run(num_episodes=100, batch_agent_calls=True)

###### #

TypeError: init() got an unexpected keyword argument 'internals'

###### #

2- And if the above code gets work and shows fast enough results, I need to parallelize the following code which is the same training as above but I store some values in each uteration. Can you show me on the following code how can I parallelize it?

# Training

States_train_target=[]

Actions_train_target=[]

Rewards_train_target=[]

Energy_train_target=[]

Reward_total_train_allepisodes_target=[]
Reward_energy_train_allepisodes_target=[]
Reward_comfort_train_allepisodes_target=[]
Reward_hygiene_train_allepisodes_target=[]

Energy_train_allepisodes_target=[]

for episode in range(int(Train_weeks)):

print(episode)

Reward_total_train_eachepisode_target=[]
Reward_energy_train_eachepisode_target=[]
Reward_comfort_train_eachepisode_target=[]
Reward_hygiene_train_eachepisode_target=[]

Energy_train_eachepisode_target=[]

states = environment.reset()
environment.timestep=episode*episode_times

terminal = False
while not terminal:
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
states=tuple(states)
agent.observe(terminal=terminal, reward=reward)

Reward_total_train_eachepisode_target.append(reward)
#Reward_energy_train_eachepisode_target.append(reward[1])
#Reward_comfort_train_eachepisode_target.append(reward[2])
#Reward_hygiene_train_eachepisode_target.append(reward[3])
#Energy_train_eachepisode_target.append(energy)
States_train_target.append(states)
Actions_train_target.append(actions)
Rewards_train_target.append(reward)
#Energy_train_target.append(energy)

Reward_total_train_allepisodes_target.append(Reward_total_train_eachepisode_target)
#Reward_energy_train_allepisodes_target.append(Reward_energy_train_eachepisode_target)
#Reward_comfort_train_allepisodes_target.append(Reward_comfort_train_eachepisode_target)
#Reward_hygiene_train_allepisodes_target.append(Reward_hygiene_train_eachepisode_target)
#Energy_train_allepisodes_target.append(Energy_train_eachepisode_target)
###### #

I do not have access to NVIDIA to work with GPU so I guess the only way would be to parallelize the above code.

Thanks

SurferZergy
@SurferZergy
Hi, was wondering if you can have 'complex obs and action spaces' in a custom env? like:
    def states(self):
d = {
'board': dict(type='ndarray', num_values=[17,28]),
'state': dict(type='float', num_values=50),
'procedures': dict(type='float', num_values=23),
'available-action-types': dict(type='float', num_values=43),
}

return d

def actions(self):
d = {
'action-type': dict(type='int', num_values=43),
'x': dict(type='int', num_values=28),
'y': dict(type='int', num_values=17),
}

return d
Alexander Kuhnle
@AlexKuhnle
@SurferZergy short answer: yes, you can. :-)
@amirrezaheidari If the environment computations are slow, parallelization should definitely help (if environments run on different servers/cores via socket/multiprocessing). You should try whether batch_agent_calls True or False is better. Regarding the exception: can you post the stacktrace? And are you on the latest Github version?
Finally, regarding the second question about recording values: could you not just record them inside your custom environment implementation at every timestep? You should have all the values available there...
SurferZergy
@SurferZergy
@AlexKuhnle niceee! was wondering if there happens to be some documentation or examples for this? (esp the ndarray stuff)
Alexander Kuhnle
@AlexKuhnle
Gym environments or so tend to be single-state, but most unittests use multi-component state and action spaces, see here. What exactly do you want to know about it? Oh, and I didn't see the type='ndarray' in your example, but it should always be int or float as type. Then you can always specify shape (which is where you mistakenly seem to use num_values), and in case of int you should specify num_values (which gives the number of options per value, so [0, ..., n-1], and in case of float it's recommended to specify min_value and max_value bounds.
Hope that helps.
SurferZergy
@SurferZergy
thanks! that helps!
amirrezaheidari
@amirrezaheidari

Hi @AlexKuhnle
So the Runner utility trains my agent as Parallel computation right?
Here is the full stacktrace:

###### #

TypeError Traceback (most recent call last)

<ipython-input-38-8bdeda9d84be> in <module>
1 Runner(
2 agent=r'C:\Python_TRNSYS_integration\Parallelization test\agent.json', environment=HotWaterEnvironment,
----> 3 num_parallel=8
4 )
5 runner.run(num_episodes=1, batch_agent_calls=True)

C:\Python\Anaconda3\lib\site-packages\tensorforce\execution\runner.py in init(self, agent, environment, max_episode_timesteps, evaluation, num_parallel, environments, remote, blocking, host, port)
173 self.agent = Agent.create(
174 agent=agent, environment=environment,
--> 175 parallel_interactions=(num_parallel - int(self.evaluation))
176 )
177 else:

C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
117 with open(agent, 'r') as fp:
--> 119 return Agent.create(agent=agent, environment=environment,
kwargs)
120
121 elif '.' in agent:

C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
110 agent = kwargs.pop('agent', kwargs.pop('type', 'default'))
111
--> 112 return Agent.create(agent=agent, environment=environment,
kwargs)
113
114 elif isinstance(agent, str):

C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
129 # Keyword specification
130 agent = tensorforce.agents.agents[agent]
--> 131 return Agent.create(agent=agent, environment=environment,
kwargs)
132
133 else:

C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
96
97 if isinstance(agent, type) and issubclass(agent, Agent):
---> 98 agent = agent(
kwargs)
99 assert isinstance(agent, Agent)
100 else:

C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\dqn.py in init(self, states, actions, memory, batch_size, max_episode_timesteps, network, update_frequency, start_updating, learning_rate, huber_loss, horizon, discount, predict_terminal_values, target_update_weight, target_sync_frequency, state_preprocessing, reward_preprocessing, exploration, variable_noise, l2_regularization, entropy_regularization, parallel_interactions, config, saver, summarizer, recorder, estimate_terminal, kwargs)
215 state_preprocessing=state_preprocessing, reward_preprocessing=reward_preprocessing,
216 exploration=exploration, variable_noise=variable_noise,
--> 217 saver=saver, summarizer=summarizer,
kwargs
218 )

TypeError: init() got an unexpected keyword argument 'internals'

###### #

And I am on this Github: https://github.com/tensorforce/tensorforce/blob/master/examples/parallelization.py

Can you please show me with an example how can I record the states and rewards inside of my environments as you suggested? Following is my custom environment: