Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 29 23:31

    dependabot[bot] on pip

    Bump mistune from 0.8.4 to 2.0.… (compare)

  • May 24 17:30

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

Alexander Kuhnle
@AlexKuhnle
(So to be clear: it's not cleared, otherwise frequency and batch size would be equivalent)
nasrashvilg1
@nasrashvilg1
hi, is there a way to enable tensorforce use of FP16 precision rather than default FP32 to run faster? along the lines of this: https://www.tensorflow.org/guide/mixed_precision
Alexander Kuhnle
@AlexKuhnle
Hey @nasrashvilg1, there is a way of changing the TF dtype used within Tensorforce, as currently illustrated in the "precision unittest", but as the comment there says, the TF optimizers seem to expect float32 or float64 and not work with float16. Currently, Tensorforce just re-uses the TF versions for the typical first-order optimizers, but obviously one could re-implement them without this constraint. If you only need lower precision for deployment, I would expect that the SavedModel format makes it possible to convert/export to a float16 version of the model.
nasrashvilg1
@nasrashvilg1
@AlexKuhnle thanks for your response on this! - qq after 3rd episode my agent which is based on custom gym environment starts executing the same action all (sequence of actions is the same for every episode after the third-fourth episode onwards to the end of model training) what could be the issue if you or anyone else has run into this before?
Pedro Chinen
@chinen93
What is your agent configuration? For my experience, I added entropy regularization and exploration to prevent my agent to overfitting into taking always the same actions.
nasrashvilg1
@nasrashvilg1
@chinen93 agent = Agent.create(
agent='tensorforce',
environment=environment, # alternatively: states, actions, (max_episode_timesteps)
memory=10000,
update=dict(unit='timesteps', batch_size=64),
optimizer=dict(type='adam', learning_rate=3e-4),
policy=dict(network='auto'),
objective='policy_gradient',
reward_estimation=dict(horizon=20)
) - can you please guide me into how I can add the entropy regularization and exploration? The agent starts consistently taking the same action after few episodes - I think it might be prematurely converging to optimal policy when more learning needs to be done
Pedro Chinen
@chinen93

Take a look in this doc link about the "tensorforce" agente: https://tensorforce.readthedocs.io/en/latest/agents/tensorforce.html

You can add entropy_regularization, l2_regularization, exploration and other hyperparameters to try to improve your agent. However it depends on your action space

nasrashvilg1
@nasrashvilg1
@chinen93 my action space is spaces.Discrete(3)
Pedro Chinen
@chinen93
@nasrashvilg1 so try to create your agent as:
Agent.create(
agent='tensorforce',
environment=environment, # alternatively: states, actions, (max_episode_timesteps)
memory=10000,
update=dict(unit='timesteps', batch_size=64),
optimizer=dict(type='adam', learning_rate=3e-4),
policy=dict(network='auto'),
objective='policy_gradient',
reward_estimation=dict(horizon=20),
exploration=3e-4,
entropy_regularization=1e-4,
l2_regularization=1e-4
)
And change the last configurations as needed.
nasrashvilg1
@nasrashvilg1
@chinen93 ok thanks - will try that and run some experiments! :-)
in the last configuration you mean exploraiton, entropy_regularization and l2_regularization
Pedro Chinen
@chinen93
@AlexKuhnle How should I approach making my RL agent learn from an expert before trying things for itself? Make a simple supervisioned learning environment and just import the model into the RL loop, is this the right way?
Alexander Kuhnle
@AlexKuhnle
You can either use the pretrain function (if you have data in the right format, see in the example), or more manually use the experience and update functions. A supervised learning environment probably won't work as well.
Benno Geißelmann
@GANdalf2357
Hi, if I have a saved agent (saved as npz and json file), what is the right way to load/use this model for using it only within inference/prediction?
Pedro Chinen
@chinen93
@GANdalf2357, you can use the https://github.com/tensorforce/tensorforce/blob/master/examples/save_load_agent.py to see some examples of how to save/load a model. The runner part can be explicit with a while-loop if you need more control.
Pedro Chinen
@chinen93
@AlexKuhnle I still do not understand how the pretrain works. The PPO is on-policy right? How it can learn from a sequence of state-actions that is not from the current policy?
Alexander Kuhnle
@AlexKuhnle
Hi @GANdalf2357, in addition to what @chinen93 said: inference-only can be done e.g. by using evaluation=True when using Runner, or act with independent=True and deterministic=True (see here).
Alexander Kuhnle
@AlexKuhnle
@chinen93, the combination of experience and update is a bit like supervised learning, so train the policy distribution to output the corresponding action per state according to the data. This consideration is more or less ignoring the theory around policy gradient and on-policy etc, just looking at the problem from a supervised angle, and this can work (but can also not lead anywhere). The current Tensorforce interface is not ideal, bit too generic and hence may be used wrongly, but I also don't have too much experience with pretraining, behavioral cloning, etc.
Alexander Kuhnle
@AlexKuhnle
The pretrain function is a bit specific and assumes interaction traces including reward, so basically the recorded data of another agent, as in the pretrain example. Experience and update gives more flexibility, but might not be obvious. What data do you have? Individual data points of "expert" state->action decisions? Demonstration traces of state-action pairs? Or full demo traces including reward? Or potentially even more "random" trajectory data, not "expert"/"demonstration"? Depending on which case, applies, there are different possibilities.
Benno Geißelmann
@GANdalf2357
@chinen93 @AlexKuhnle thanks for your help! this is what I was looking for.
nasrashvilg1
@nasrashvilg1
is there a lot of value in having discretized space for PPO agent or any other agent in general? I have a continues state space with 5 readings - which can range from 0.0 to 50. 0(float numbers) I can also have all the 5 readings discretized - just trying to decide if it's worth the effort?
Alexander Kuhnle
@AlexKuhnle
Hey, I don't think you need to discretize these values -- in fact, I would expect it to work less well (but depends on the problem characteristics of course)
Chris Hinrichs
@chris405_gitlab
Hi all, I'm having a problem with nested tensor specifications. I specified a nested block called "MID_1_bid_0", with a float element "qty", but when I try to run the network I get this error:
ValueError: 'MID_1_bid_0/qty_preprocessing' is not a valid module name. Module names must be valid Python identifiers (e.g. a valid class name).
This is in the state space of the agent.
From looking at the source it appears that the naming convention is indeed to separate parent and child objects with a '/', but then that name containing a '/' gets passed as a module name to tf.Module, which is illegal.
Chris Hinrichs
@chris405_gitlab
The tf error is in
site-packages/tensorflow/python/module/module.py", line 113, in __init__ "identifiers (e.g. a valid class name)." % name)
Alexander Kuhnle
@AlexKuhnle
Hi @chris405_gitlab, that looks indeed like a bug, I will check that.
Alexander Kuhnle
@AlexKuhnle
@chris405_gitlab , can you again check it with the latest Github master version? I think the handling should be improved now.
Chris Hinrichs
@chris405_gitlab
@AlexKuhnle Thanks! I'll take a look.
Chris Hinrichs
@chris405_gitlab
@AlexKuhnle That does appear to have fixed it - I'm continuing debugging where I left off, so I'm not running end-to-end yet, but it does look like the next error is mine. Thanks again.
Chris Hinrichs
@chris405_gitlab
I'm getting this warning when I start a runner, and it's taking 2-3 minutes for the preprocessing to complete before running. Is that normal, and is there a known misuse that would cause this?
/site-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Chris Hinrichs
@chris405_gitlab
@AlexKuhnle I think I have another bug for you. In Agent.py, the first thing fn_act() does is to separate any variable name + '_mask' from the state spec where name is any name found in the action_spec. It does this recursively using fmap(), calling this function:
        # Separate auxiliaries
        def function(name, spec):
            auxiliary = ArrayDict()
            if self.config.enable_int_action_masking and spec.type == 'int' and \
                    spec.num_values is not None:
                if name is None:
                    name = 'action'
                # Mask, either part of states or default all true
                auxiliary['mask'] = states.pop(name + '_mask', np.ones(
                    shape=(num_parallel,) + spec.shape + (spec.num_values,), dtype=spec.np_type()
                ))
            return auxiliary

However, when I run it I get a KeyError exception, where the key is a root name from action space. I instrumented the line from nested_dict.pop() that threw the error, like so:

  3         elif '/' in key:
  2             key, subkey = key.split('/', 1)
  1             if not key in self:
371                 print(f"pop {key} {subkey}")
  1                 import pprint
  2                 pprint.pprint(self)
  3             value = super().__getitem__(key)
  4             assert isinstance(value, self.__class__)
  5             return value.pop(subkey, default)

This is what it printed:
pop MID_1_counter_0 promise_date_mask {'MID_1_bid_0/price': array([1.15]), 'MID_1_bid_0/promise_date': array([4]), 'MID_1_bid_0/qty': array([44135.2]), 'MID_1_bid_0/supplier_tier': array([0]), ...

Chris Hinrichs
@chris405_gitlab
Note that in my scenario, names with _bid_ in them are state variables, and names with _counter_ in them are action variables.
Now, in the top level to pop() there is a default value, however, when a nested variable is encountered, the handling for that case doesn't consult the default value, it just says value = super().__getitem__(key)
Alexander Kuhnle
@AlexKuhnle
Hey, first, the warning Converting sparse IndexedSlices to a dense Tensor of unknown shape. comes up if you use embeddings (used by "auto" network if state is int), and maybe in other situations as well. I've read a bit about it a while ago, and it doesn't seem to be critical, if e.g. the number of embeddings (num_values of int state) is reasonable. Model initialization may take a while if the network is bigger -- is this the case for you?
And regarding the second issue: I will look into it soon.
Chris Hinrichs
@chris405_gitlab
@AlexKuhnle Thanks for the tip. Meanwhile, I modified the pop() code in nested_dict to return the default if the super key is not found, (instead of printing debug info), but it led to an invalid-shape error. I think the problem is that if the action is nested then there won't be a shape argument for parent nodes (only leaf nodes have a shape). Given that, what I would like to do is to set self.config.enable_int_action_masking to False, but I don't see a way to do that... The config object explicitly overrides __set_attr__ and I wasn't able to pass it as a constructor arg to the agent. So, what's the right way to do that?
Alexander Kuhnle
@AlexKuhnle
Setting enable_int_action_masking can be done via the config argument of any agent (docs here). That should hopefully work.
I've also made the change you suggested to NestedDict -- would you mind posting the shape exception, since I don't know why that would come up?
Chris Hinrichs
@chris405_gitlab
Here is the stack trace:
  File "train_rl_agent.py", line 77, in run_agent
    runner.run(num_episodes=sim_config["train_episodes"])
  File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 545, in run
    self.handle_act(parallel=n)
  File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 579, in handle_act
    actions = self.agent.act(states=self.states[parallel], parallel=parallel)
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 388, in act
    deterministic=deterministic
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/recorder.py", line 267, in act
    num_parallel=num_parallel
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 415, in fn_act
    states = self.states_spec.to_tensor(value=states, batched=True, name='Agent.act states')
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
    value=value[name], batched=batched, recover_empty=recover_empty
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
    value=value[name], batched=batched, recover_empty=recover_empty
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensor_spec.py", line 149, in to_tensor
    raise TensorforceError.value(name=name, argument='value', value=value, hint='shape')
tensorforce.exception.TensorforceError: Invalid value for TensorSpec.to_tensor argument value: 0 shape.
Chris Hinrichs
@chris405_gitlab
This message was deleted
Chris Hinrichs
@chris405_gitlab
I disabled enable_int_action_masking and I'm still getting that error, so it's not related to the issue with pop() not defaulting. Thanks for the link showing how to do that.
Chris Hinrichs
@chris405_gitlab

@AlexKuhnle I've figured out what's happening, but I don't fully understand the cause. I instrumented the code where the exception is raised like so:

  3         # Check whether shape matches
  2         if value.shape[int(batched):] != self.shape:
  1             print(f"\nvalue {value} type {type(value)} batched {batched}")
150             print(f"value shape {value.shape[int(batched):]} self shape {self.shape}")
  1             import pprint
  2             pprint.pprint(self)
  3             pprint.pprint(value)
  4             raise TensorforceError.value(name=name, argument='value', value=value, hint='shape')

and this is what it prints:

value [0] type <class 'numpy.ndarray'>
value shape () self shape (1,)
TensorSpec(type=int, shape=(1,), num_values=4)
array([0])

The reason is that batched is True, but the value shape doesn't have a batch dimension.

Alexander Kuhnle
@AlexKuhnle
I realise there is something missing which makes the exception message less useful/specific -- will fix that. But it looks like a subtle shape problem of some inputs, as if the value returned by the environment is not perfectly matching the shape of the states specification.
Ah, was just writing :-)
Chris Hinrichs
@chris405_gitlab
wow
I was just writing too - I tried removing batch_size from the agent params, but it tells me that one is required.
Alexander Kuhnle
@AlexKuhnle
That's what I thought: it seems your environment specifies the shape as (1,), whereas what it actually returns is of shape (). That could be the case, for instance, if the state value is returning a primitive Python type (which are of shape ()).
Tensorforce is very strict about these shapes, since TensorFlow and the computation graph are, too (but unlike e.g. NumPy, which is often very forgiving).