AlexKuhnle on master
Add not-maintained message to r… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
Bump mistune from 0.8.4 to 2.0.… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.8.0 to 2… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)
dependabot[bot] on pip
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.7.0 to 2… (compare)
AlexKuhnle on master
Correct type (compare)
AlexKuhnle on master
Add missing box2d dependency (compare)
AlexKuhnle on master
Downgrade numpy version for Py3… (compare)
AlexKuhnle on master
Update to TF 2.7, update depend… (compare)
AlexKuhnle on master
Update setup and travis config (compare)
AlexKuhnle on master
make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)
dependabot[bot] on pip
AlexKuhnle on master
Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)
dependabot[bot] on pip
Bump tensorflow from 2.6.0 to 2… (compare)
AlexKuhnle on master
Update gym version requirement (compare)
Hi @GANdalf2357 , I think this issue was recently resolved, so updating to the latest Github master should help (or if you're using the pip version, changing to the Github version -- of course, sooner or later there will be a new pip version, maybe should do that soon).
Hi @AlexKuhnle I now switched to master to overcome this issue but now the reset() call failes, do have I have to change something in my state with the new master? I now get this error in reset() tensorforce.exception.TensorforceError: Environment.reset: invalid type <class 'tuple'> != float for state.
update_frequency
is not about the reward, but about the optimization step. So for each update you have a certain batch-size, say 10 episodes, or 64 timesteps, and unless specified differently, that will also be the update_frequency
. However, you may want to update more frequently than that, so you may set update_frequency
to something between 1
and batch_size
(> batch_size
doesn't really make sense).
>> batch_size
, at least 100x or much more. Moreover, it's typical particularly here to set the update_frequency
lower than batch_size
, say 8 vs 64. But it's a parameter that can be tuned, at least to get the rough magnitude right.
int
types and min_value
, max_value
, and num_value
should be specified. However if I set these three I got an error File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/utils/tensor_spec.py", line 45, in __init__
name='TensorSpec', argument='min/max_value', condition='num_values specified'
tensorforce.exception.TensorforceError: Invalid TensorSpec argument min/max_value given num_values specified.
As I trace the source code it seems like these three cannot be all set at the same time?
num_values
for x1 as 80−58+1=23 and x2 as 1500−0=1500?
min_value
and max_value
, why not let the code to calculate the num_value
itself?
min_/max_value
are for float
types as lower and upper bound, whereas num_values
is for int
types. The fact that it also specifies min_/max_value
implicitly shouldn't matter, so you shouldn't need to specify all three, just the two or one, depending on the type. (The idea to still add it is, I think, they internally share some asserts via min/max-value, plus there could potentially be layer types which want to know min/max-bounds... but it really doesn't matter right now.)
num_values
in case of int
.
num_values
without setting max_value
and min_value
and it worked, Traceback (most recent call last):
File "reinforcement_learning_continuous.py", line 221, in <module>
save_best_agent=True
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 548, in run
self.handle_observe(parallel=n)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/execution/runner.py", line 659, in handle_observe
terminal=self.terminals[parallel], reward=self.rewards[parallel], parallel=parallel
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 511, in observe
terminal=terminal_tensor, reward=reward_tensor, parallel=parallel_tensor
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorforce/core/module.py", line 128, in decorated
output_args = function_graphs[str(graph_params)](*graph_args)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 784, in __call__
result = self._call(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 818, in _call
results = self._stateful_fn(*args, **kwds)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2972, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1948, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in call
ctx=ctx)
File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
**tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid gradient: contains inf or nan. : Tensor had NaN values**
[[{{node agent/StatefulPartitionedCall/agent/cond_1/then/_311/agent/cond_1/StatefulPartitionedCall/agent/StatefulPartitionedCall_7/policy_optimizer/StatefulPartitionedCall/policy_optimizer/VerifyFinite/CheckNumerics}}]] [Op:__inference_observe_5452]
Function call stack:
observe
@AlexKuhnle I got this TypeError TypeError: __init__() got multiple values for keyword argument 'optimizer'
while setting up A2C like this:
elif args.agent=="a2c":
agent = Agent.create(
agent='a2c',
environment=environment,
max_episode_timesteps=5,
batch_size=32,
network=[
dict(type='dense', size=128, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
optimizer=dict(
optimizer='adam', learning_rate=1e-3, clipping_threshold=1e-2,
multi_step=10, subsampling_fraction=64, linesearch_iterations=5,
doublecheck_update=True
),
critic=[
dict(type='dense', size=64, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
)
However this is what described we are supposed to be setting optimizer in the 'optimizer' section.
learning_rate
, whereas the rest of optimizer
is implicitly specified as ADAM. You would need to move to the tensorforce
agent to have all arguments available. The idea behind the various agent sub-classes is to provide a "standard" interface with only the typical arguments (and things like subsampling_fraction
, linesearch
are not typical for A2C), however, maybe I'll change this at some point.
Hi @AlexKuhnle , to your previous question regarding DQN gradient assertion, here are my responses:
To your questions, here are the reponses:
Agent config.:
agent = Agent.create(
agent="dqn",
environment=environment,
memory=300,
batch_size=32,
network="auto",
update_frequency=1,
learning_rate=1e-5,
discount=0.9
)
2.&3. Very quick, running on the first episode.
4.&5. There is only one action since it happened on the first episode.
Hi
I need to train my agent for many iterations. My agent interacts with TRNSYS as environment and calls it at each action to calculate the next state. Therefore, it takes a long time to train it (8 hours). I need to reduce this time.
1- Do you think if I perform parallel computing I will get fast enough operation?
2- I tried the following way to test if I can do parallel computing but I get the following error:
Runner(
agent=r'C:\Python_TRNSYS_integration\Parallelization test\agent.json', environment=HotWaterEnvironment,
num_parallel=4
)
runner.run(num_episodes=100, batch_agent_calls=True)
TypeError: init() got an unexpected keyword argument 'internals'
2- And if the above code gets work and shows fast enough results, I need to parallelize the following code which is the same training as above but I store some values in each uteration. Can you show me on the following code how can I parallelize it?
States_train_target=[]
Actions_train_target=[]
Rewards_train_target=[]
Energy_train_target=[]
Reward_total_train_allepisodes_target=[]
Reward_energy_train_allepisodes_target=[]
Reward_comfort_train_allepisodes_target=[]
Reward_hygiene_train_allepisodes_target=[]
Energy_train_allepisodes_target=[]
for episode in range(int(Train_weeks)):
print(episode)
Reward_total_train_eachepisode_target=[]
Reward_energy_train_eachepisode_target=[]
Reward_comfort_train_eachepisode_target=[]
Reward_hygiene_train_eachepisode_target=[]
Energy_train_eachepisode_target=[]
states = environment.reset()
environment.timestep=episode*episode_times
terminal = False
while not terminal:
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
states=tuple(states)
agent.observe(terminal=terminal, reward=reward)
Reward_total_train_eachepisode_target.append(reward)
#Reward_energy_train_eachepisode_target.append(reward[1])
#Reward_comfort_train_eachepisode_target.append(reward[2])
#Reward_hygiene_train_eachepisode_target.append(reward[3])
#Energy_train_eachepisode_target.append(energy)
States_train_target.append(states)
Actions_train_target.append(actions)
Rewards_train_target.append(reward)
#Energy_train_target.append(energy)
Reward_total_train_allepisodes_target.append(Reward_total_train_eachepisode_target)
#Reward_energy_train_allepisodes_target.append(Reward_energy_train_eachepisode_target)
#Reward_comfort_train_allepisodes_target.append(Reward_comfort_train_eachepisode_target)
#Reward_hygiene_train_allepisodes_target.append(Reward_hygiene_train_eachepisode_target)
#Energy_train_allepisodes_target.append(Energy_train_eachepisode_target)
I do not have access to NVIDIA to work with GPU so I guess the only way would be to parallelize the above code.
Thanks
def states(self):
d = {
'board': dict(type='ndarray', num_values=[17,28]),
'state': dict(type='float', num_values=50),
'procedures': dict(type='float', num_values=23),
'available-action-types': dict(type='float', num_values=43),
}
return d
def actions(self):
d = {
'action-type': dict(type='int', num_values=43),
'x': dict(type='int', num_values=28),
'y': dict(type='int', num_values=17),
}
return d
batch_agent_calls
True
or False
is better. Regarding the exception: can you post the stacktrace? And are you on the latest Github version?
type='ndarray'
in your example, but it should always be int
or float
as type
. Then you can always specify shape
(which is where you mistakenly seem to use num_values
), and in case of int
you should specify num_values
(which gives the number of options per value, so [0, ..., n-1]
, and in case of float
it's recommended to specify min_value
and max_value
bounds.
Hi @AlexKuhnle
So the Runner utility trains my agent as Parallel computation right?
Here is the full stacktrace:
TypeError Traceback (most recent call last)
<ipython-input-38-8bdeda9d84be> in <module>
1 Runner(
2 agent=r'C:\Python_TRNSYS_integration\Parallelization test\agent.json', environment=HotWaterEnvironment,
----> 3 num_parallel=8
4 )
5 runner.run(num_episodes=1, batch_agent_calls=True)
C:\Python\Anaconda3\lib\site-packages\tensorforce\execution\runner.py in init(self, agent, environment, max_episode_timesteps, evaluation, num_parallel, environments, remote, blocking, host, port)
173 self.agent = Agent.create(
174 agent=agent, environment=environment,
--> 175 parallel_interactions=(num_parallel - int(self.evaluation))
176 )
177 else:
C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
117 with open(agent, 'r') as fp:
118 agent = json.load(fp=fp)
--> 119 return Agent.create(agent=agent, environment=environment, kwargs)
120
121 elif '.' in agent:
C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
110 agent = kwargs.pop('agent', kwargs.pop('type', 'default'))
111
--> 112 return Agent.create(agent=agent, environment=environment, kwargs)
113
114 elif isinstance(agent, str):
C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
129 # Keyword specification
130 agent = tensorforce.agents.agents[agent]
--> 131 return Agent.create(agent=agent, environment=environment, kwargs)
132
133 else:
C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\agent.py in create(agent, environment, kwargs)
96
97 if isinstance(agent, type) and issubclass(agent, Agent):
---> 98 agent = agent(kwargs)
99 assert isinstance(agent, Agent)
100 else:
C:\Python\Anaconda3\lib\site-packages\tensorforce\agents\dqn.py in init(self, states, actions, memory, batch_size, max_episode_timesteps, network, update_frequency, start_updating, learning_rate, huber_loss, horizon, discount, predict_terminal_values, target_update_weight, target_sync_frequency, state_preprocessing, reward_preprocessing, exploration, variable_noise, l2_regularization, entropy_regularization, parallel_interactions, config, saver, summarizer, recorder, estimate_terminal, kwargs)
215 state_preprocessing=state_preprocessing, reward_preprocessing=reward_preprocessing,
216 exploration=exploration, variable_noise=variable_noise,
--> 217 saver=saver, summarizer=summarizer, kwargs
218 )
TypeError: init() got an unexpected keyword argument 'internals'
And I am on this Github: https://github.com/tensorforce/tensorforce/blob/master/examples/parallelization.py
Can you please show me with an example how can I record the states and rewards inside of my environments as you suggested? Following is my custom environment:
## Some initializations. Will eventually parameterize this in the constructor.
self.timestep = 0
self.Tstorage= 60
self.Tamb=6
self.Hours_from_superheat=0 #Total number of hours from the last superheat
self.Hour=0
self.Day=0
self.Weekday=1
self.lags=[1,2,3,4,5,6]
self.Demandintervals_history=[0 for i in self.lags]
#self.Tout_predicted=[18.20, 17.85, 17.15, 16.40, 15.65, 14.95, 14.20, 14.60, 15.60, 16.50, 17.80, 18.85]
self.Demand=0 #This is not state
self.Energy=0 #this is not state
super().__init__()
def states(self):
return dict(type='float', shape=(11,))
def actions(self):
"""Action 0 means no heater, temperature approaches 0.0. Action 1 means
the heater is on and the room temperature approaches 1.0.
"""
return dict(type='int', num_values=2)
# Optional, should only be defined if environment has a natural maximum
# episode length
def max_episode_timesteps(self):
return super().max_episode_timesteps()
# Optional
def close(self):
super().close()
def reset(self):
"""Reset state.
"""
if self.timestep==0:
self.timestep = 0
self.Tstorage= 60
self.Tamb=6
self.Hours_from_superheat=0 #Total number of hours from the last superheat
self.Hour=0
self.Day=0
self.Weekday=1
self.lags=[1,2,3,4,5,6]
self.Demandintervals_history=[0 for i in self.lags]
#self.Tout_predicted=[18.20, 17.85, 17.15, 16.40, 15.65, 14.95, 14.20, 14.60, 15.60, 16.50, 17.80, 18.85]
else:
self.timestep=0
super().__init__()
states_list=[self.Tstorage,self.Hours_from_superheat,self.Hour, self.Day, self.Tamb]+self.Demandintervals_history
return states_list
def response(self,action):
power,Ttank_end=run(signal=action, demand=self.Demand, Ttank_start=self.Tstorage, Tamb=self.Tamb)
outputs=[power,Ttank_end]
return outputs
def reward_compute(self):
Demand=self.Demand
Energy=self.Energy
Tstorage=self.Tstorage
Hours_from_superheat=self.Hours_from_superheat
#more accurate reward design
R_energy=-a*Energy
R_comfort=-b*max(40-Tstorage,0)
R_hygiene=-c*max(Hours_from_superheat-24,0)
#A simple reward design
"""if Demand==0:
R_energy=-a*Energy
R_comfort=0
if Hours_from_superheat>24:
R_hygiene=-b
else:
R_hygiene=0
if Demand>0:
R_energy=-a*Energy
if Hours_from_superheat>24:
R_hygiene=-b
else:
R_hygiene=0
if Tstorage<40:
R_comfort=-c
else:
R_comfort=0"""
R_total=R_energy+R_hygiene+R_comfort
#[R_total,R_energy,R_comfort,R_hygiene]
return R_total
def execute(self, actions):
## Check the action is either 0 or 1 -- heater on or off.
assert act
next_state, reward, terminal
at the end of execute()
(and states
at the end of reset()
), you can just add them to a list which you can retrieve later: self.states_history.append(next_state)
etc... Plus, at the beginning of execute()
you can record the action which was taken, self.actions_history.append(actions)
.
linear_decay = dict(type='decaying', unit='timesteps', decay="linear", num_steps=2168, initial_value=0.9,
final_value=0.001) #f(t) = C - rt
Name='double_dqn'
Learning_rate=3e-4
Batch_size=24
Update_frequency=12
Memory=20724
agent = Agent.create(
agent=Name,
learning_rate=Learning_rate,
environment=environment,
batch_size=Batch_size,
update_frequency=Update_frequency,
exploration=linear_decay,
network="auto",
memory=Memory,
)