Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 29 23:31

    dependabot[bot] on pip

    Bump mistune from 0.8.4 to 2.0.… (compare)

  • May 24 17:30

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Feb 10 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

  • Oct 20 2021 20:48

    AlexKuhnle on master

    fix ZeroDivisionError in the pr… Merge pull request #836 from hr… (compare)

  • Oct 02 2021 15:21

    AlexKuhnle on master

    Fix environments unittest (compare)

  • Oct 02 2021 13:46

    AlexKuhnle on master

    Update gym dependency and fix e… (compare)

Chris Hinrichs
@chris405_gitlab

However, when I run it I get a KeyError exception, where the key is a root name from action space. I instrumented the line from nested_dict.pop() that threw the error, like so:

  3         elif '/' in key:
  2             key, subkey = key.split('/', 1)
  1             if not key in self:
371                 print(f"pop {key} {subkey}")
  1                 import pprint
  2                 pprint.pprint(self)
  3             value = super().__getitem__(key)
  4             assert isinstance(value, self.__class__)
  5             return value.pop(subkey, default)

This is what it printed:
pop MID_1_counter_0 promise_date_mask {'MID_1_bid_0/price': array([1.15]), 'MID_1_bid_0/promise_date': array([4]), 'MID_1_bid_0/qty': array([44135.2]), 'MID_1_bid_0/supplier_tier': array([0]), ...

Note that in my scenario, names with _bid_ in them are state variables, and names with _counter_ in them are action variables.
Now, in the top level to pop() there is a default value, however, when a nested variable is encountered, the handling for that case doesn't consult the default value, it just says value = super().__getitem__(key)
Alexander Kuhnle
@AlexKuhnle
Hey, first, the warning Converting sparse IndexedSlices to a dense Tensor of unknown shape. comes up if you use embeddings (used by "auto" network if state is int), and maybe in other situations as well. I've read a bit about it a while ago, and it doesn't seem to be critical, if e.g. the number of embeddings (num_values of int state) is reasonable. Model initialization may take a while if the network is bigger -- is this the case for you?
And regarding the second issue: I will look into it soon.
Chris Hinrichs
@chris405_gitlab
@AlexKuhnle Thanks for the tip. Meanwhile, I modified the pop() code in nested_dict to return the default if the super key is not found, (instead of printing debug info), but it led to an invalid-shape error. I think the problem is that if the action is nested then there won't be a shape argument for parent nodes (only leaf nodes have a shape). Given that, what I would like to do is to set self.config.enable_int_action_masking to False, but I don't see a way to do that... The config object explicitly overrides __set_attr__ and I wasn't able to pass it as a constructor arg to the agent. So, what's the right way to do that?
Alexander Kuhnle
@AlexKuhnle
Setting enable_int_action_masking can be done via the config argument of any agent (docs here). That should hopefully work.
I've also made the change you suggested to NestedDict -- would you mind posting the shape exception, since I don't know why that would come up?
Chris Hinrichs
@chris405_gitlab
Here is the stack trace:
  File "train_rl_agent.py", line 77, in run_agent
    runner.run(num_episodes=sim_config["train_episodes"])
  File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 545, in run
    self.handle_act(parallel=n)
  File "/home/hinrichs/build/tensorforce/tensorforce/execution/runner.py", line 579, in handle_act
    actions = self.agent.act(states=self.states[parallel], parallel=parallel)
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 388, in act
    deterministic=deterministic
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/recorder.py", line 267, in act
    num_parallel=num_parallel
  File "/home/hinrichs/build/tensorforce/tensorforce/agents/agent.py", line 415, in fn_act
    states = self.states_spec.to_tensor(value=states, batched=True, name='Agent.act states')
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
    value=value[name], batched=batched, recover_empty=recover_empty
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensors_spec.py", line 57, in to_tensor
    value=value[name], batched=batched, recover_empty=recover_empty
  File "/home/hinrichs/build/tensorforce/tensorforce/core/utils/tensor_spec.py", line 149, in to_tensor
    raise TensorforceError.value(name=name, argument='value', value=value, hint='shape')
tensorforce.exception.TensorforceError: Invalid value for TensorSpec.to_tensor argument value: 0 shape.
Chris Hinrichs
@chris405_gitlab
This message was deleted
Chris Hinrichs
@chris405_gitlab
I disabled enable_int_action_masking and I'm still getting that error, so it's not related to the issue with pop() not defaulting. Thanks for the link showing how to do that.
Chris Hinrichs
@chris405_gitlab

@AlexKuhnle I've figured out what's happening, but I don't fully understand the cause. I instrumented the code where the exception is raised like so:

  3         # Check whether shape matches
  2         if value.shape[int(batched):] != self.shape:
  1             print(f"\nvalue {value} type {type(value)} batched {batched}")
150             print(f"value shape {value.shape[int(batched):]} self shape {self.shape}")
  1             import pprint
  2             pprint.pprint(self)
  3             pprint.pprint(value)
  4             raise TensorforceError.value(name=name, argument='value', value=value, hint='shape')

and this is what it prints:

value [0] type <class 'numpy.ndarray'>
value shape () self shape (1,)
TensorSpec(type=int, shape=(1,), num_values=4)
array([0])

The reason is that batched is True, but the value shape doesn't have a batch dimension.

Alexander Kuhnle
@AlexKuhnle
I realise there is something missing which makes the exception message less useful/specific -- will fix that. But it looks like a subtle shape problem of some inputs, as if the value returned by the environment is not perfectly matching the shape of the states specification.
Ah, was just writing :-)
Chris Hinrichs
@chris405_gitlab
wow
I was just writing too - I tried removing batch_size from the agent params, but it tells me that one is required.
Alexander Kuhnle
@AlexKuhnle
That's what I thought: it seems your environment specifies the shape as (1,), whereas what it actually returns is of shape (). That could be the case, for instance, if the state value is returning a primitive Python type (which are of shape ()).
Tensorforce is very strict about these shapes, since TensorFlow and the computation graph are, too (but unlike e.g. NumPy, which is often very forgiving).
Chris Hinrichs
@chris405_gitlab
Ah, ok so leave scalar types as ()
I put in a shape of (1,) for all scalars earlier to remove sources of variation while debugging another problem. I can try it now without that.
Alexander Kuhnle
@AlexKuhnle
Yes, that would be the fix.
Chris Hinrichs
@chris405_gitlab
Yes, it does seem to have fixed it. Thanks again.
Chris Hinrichs
@chris405_gitlab
I'm trying to run several environments in parallel, and as soon as I go from one agent to two, I get OSError: [Errno 12] Cannot allocate memory. I also note that when I first instantiate the agent, it takes about 3 minutes to complete the Agent.create() process. So far I've been using network='auto'. I printed out the state and action spec, and my state-space has a total of 59 scalar components, and the action space has a total of 80.
As discussed before, there was a potential that the integer space embeddings could be a cause. In the state space, there are 10 variables with 4 values, and another 10 with 10 values. In the action space, there are 10 with 4, 10 with 10, and 10 with 20. I tried setting the 20-valued variable to use only 10 in the action space with the same result.
Chris Hinrichs
@chris405_gitlab
Does anyone have any experience with a model blowing up unexpectedly? Is 10 an unreasonable number of values for an int to have?
Is there a simple way to print out the auto-generated network specification?
Chris Hinrichs
@chris405_gitlab
I've created a pastebin of my state and action space in case that helps
https://pastebin.com/V4xTdYWv
Chris Hinrichs
@chris405_gitlab
Update:
I refactored the code to use a vector of N variables instead of having N nested sub-variables, and that is a LOT faster.
https://pastebin.com/3pwfjS6Y
Initializing the agent is about 10x faster, and training is also about 10x faster (running in 2 threads vs. 1 before).
Alexander Kuhnle
@AlexKuhnle
It's a good idea to provide some information about the network specification.
But generally each state gets its own "input head", which means that if you have 10 values as a single vector, there is only one head, if you split it up to 10 scalars, there will be 10 heads. While these could obvs combined internally, it's unclear when one should do this and when one shouldn't, so it's left to the user to design the space.
Another helpful addition would probably be a simple warning.
Chris Hinrichs
@chris405_gitlab

In execution.Runner, the constructor takes an argument evaluation, and the comment says that if it there are multiple environments it will run the last one only, but in Runner.run() it raises an exception like so -

  1         if evaluation and (self.evaluation or len(self.environments) > 1):
419             raise TensorforceError.unexpected()

.. and the comment says that it is an error to pass evaluation = True with multiple environments.
Which behavior is intended to be the standard? As it is, run() controls because you have to call it. Also, why would it throw an error if evaluation and self.evaluation are both True?

Alexander Kuhnle
@AlexKuhnle
The documentation should probably be improved, but if I remember correctly, the reasoning is as follows: on the one hand, Runner.run() can be used multiple times in the "standard" use case of a single environment, in particular trainingrunner.run(num_episodes=???)and subsequent evaluationrunner.run(num_episodes=???, evaluation=True)(that's therun()`evaluation argument); on the other hand, it provides an interface to parallel execution, but in that case you can't just switch from training run() to evaluation run() -- however, you can specify that one of the parallel environments is used for evaluation throughout (that's the constructor evaluation argument).
However, all this constructor vs run() arguments have not been separated in a principled way for a long time, and really a Runner should probably be a one-off specification of a run which cannot be re-used. Maybe make Runner.run() a static method and put all arguments there.
Chris Hinrichs
@chris405_gitlab
I believe I have another bug for you. The scenario is this - I've trained an agent (and thanks again for all of your help with that,) and I've created a flask app to serve its outputs in a REST API so that I can better see what it's doing. In so doing, I create an Agent once (because it takes about a minute to load itself,) and for every request I spin off a new Runner.
The first request always succeeds, but the second throws this error:
  File "socket_server.py", line 72, in handle_biotech
    max_episode_timesteps = weeks + 1
  File "/home/ubuntu/tensorforce/tensorforce/execution/runner.py", line 151, in __init__
    remote=remote, blocking=blocking, host=host[0], port=port[0]
  File "/home/ubuntu/tensorforce/tensorforce/environments/environment.py", line 139, in create
    environment=environment, max_episode_timesteps=max_episode_timesteps
  File "/home/ubuntu/tensorforce/tensorforce/environments/environment.py", line 344, in __init__
    print(f"environment.max_episode_timesteps() {environment.max_episode_timesteps()}")
TypeError: <lambda>() missing 1 required positional argument: 'self'
Note that the error is not a tensorforce.Unexpected, it's a TypeError of lambda missing a self arg. The reason is that immediately below, the Runner checks whether the environment's max_episode_timesteps() returns None, and if so it equips it with a lambda that takes a self argument, just like a class method. But, since it's a lambda, it doesn't actually get passed that self argument. I changed that line to look like this, and it fixed the problem:
352             if self._environment.max_episode_timesteps() is None:
353                 self._environment.max_episode_timesteps = (lambda : max_episode_timesteps)
Alexander Kuhnle
@AlexKuhnle
@chris405_gitlab , thanks, that looks correct, didn't consider that lambda attributes are treated differently from class functions. I'm surprised I never came across that... :-/
amirrezaheidari
@amirrezaheidari
Hello, I am applying "tensorforce" and "dqn" agents to my problem, but tensorforce is performing better. May I ask what is te algorithm behind this agent?
amirrezaheidari
@amirrezaheidari
Also I have another question about "episode" which is not clear for me. In your example of room temperature controller, you reset the environment at the begining of each episode, then interact with environment for a certain number of timesteps. This is repeated for 200 episodes. But assume that in my problem, I have a dataset of 1000 rows. What I assume is that I should keep for example 80% of this data for training. Then, in each episode, I should cycle through all of the rows. So I am cycling 200 times oer the same train data. Am I right?
Alexander Kuhnle
@AlexKuhnle
Hi @amirrezaheidari, learning via RL from a fixed dataset does not follow the "standard setup". There are two options: first, you can wrap the dataset in a "pseudo-environment" and learn via the usual agent-environment episodes setup, but the question is how does this environment react to model actions that don't follow the dataset? One option is to introduce "episodes" here: you just terminate the episode when the agent does something "invalid" (according to your dataset), and potentially give a negative reward as well, if desired. Second, you can use behavioral cloning and other off-policy learning techniques. Tensorforce provides a basic behavioral-cloning-like approach, as illustrated e.g. here. Feel free to write me a message if you have more questions, or here in the channel if they're not very specific to your problem.
And regarding the Tensorforce agent: it's basically the "parent" of all agents in Tensorforce, DQN and others are more specific configurations of this agent. So what algorithm it is depends on the arguments -- are you using the default arguments?
wushilan
@wushilan
Hello. I am trying to solve a high-dimensional action space problem. The action space has about 40 two-dimensional variables. I used Tensorforce's Dueling-DQN to solve this problem before, and achieved ideal results. I recently learned that DQN and related algorithms cannot be used for high-dimensional action space problems, but the Dueling-DQN algorithm in Tensorforce does solve it. Does Tensorforce optimize the Dueling-DQN algorithm for high-dimensional action spaces?
wushilan
@wushilan
This is my action spec ::<class 'dict'>: {'x_': {'type': 'int', 'shape': (2, 2, 4), 'num_values': 5}, 'y': {'type': 'int', 'shape': (10, 3, 4), 'num_values': 2}, 'z': {'type': 'int', 'shape': (10, 2, 4, 4), 'num_values': 2}}
Alexander Kuhnle
@AlexKuhnle
Hi @wushilan, great to hear that the DuelingDQN agent worked so surprisingly well! :-) Regarding high-dimensional action spaces, I think this "conflict" may be due to two versions of "high-dimensional": DQN doesn't scale well with the number of available actions (num_values), and if you look at your action space as a product space, (2,2,4) x 5 x (10,3,4) x 2 x (10,2,4,4) x 2, that's of course gigantic. However, Tensorforce splits this space into its factors, so it's (2,2,4) actions with 5 options, and (10,3,4) + (10,2,4,4) actions with two options, and so each individual action is actually quite "low-dimensional", and this factorization works well if the actions are not correlated in very complex ways -- which presumably they aren't. I'm not sure how common such a factorization is for other frameworks, but I would be surprised if this is very uncommon. Anyway, that's the only additional feature I can think of in Tensorforce which is beneficial in such a context (and this factorization may go particularly well with the "dueling" part, but not sure).
wushilan
@wushilan
Thank you for your answer. I think you mentioned two reasons. The second reason is that the actions are processed as x, y, and z respectively, which will reduce the total dimension. So does Tensorforce need to build three neural networks separately, or divide the output of a neural network into three parts and then select x, y, and z from these three parts? The first reason can you elaborate on the first reason, I I don't quite understand yet. For example, for a (2,2,4) action with 5 options, is its dimension 5^(2X2X4)?
wushilan
@wushilan
My main question is, should he calculate 5^(2X2X4) Q-values or just calculate 2X2X4X5 Q-values when processing (2,2,4) actions with 5 options? In other words, is this action internally processed jointly or independently?
Alexander Kuhnle
@AlexKuhnle
Regarding the first reason, that's my point. Standard DQN assumes a single discrete action, and a space like yours could be fit into this framework by looking at the product space, so 5^(2x2x4). However, Tensorforce factorizes such spaces into 2x2x4x5, as you suggest. Note the difference, e.g. if you choose an action via argmax over 5 Q-values, for each 2x2x4 action independently, or if you take into account the effect of combinations. Of course, it makes sense to factorize, but I don't know how common it is in implementations, since standard DQN doesn't do that.
Alexander Kuhnle
@AlexKuhnle
Regarding the second reason: You're right, there is an additional hierarchical aspect in Tensorforce. First, each of the 2x2x4 etc actions get their own (independent!) linear layer mapping network output embedding to 2/5 Q-values. This is implemented as one big matrix multiplication yielding the flattened action tensor, embedding -> 2*2*4 * {2,5}. Second, each "action component", so x,y,z, are implemented as separate matrix multiplications. However, ultimately this just means: the network produces an output embedding, and for each action (with N alternatives) across all components we have an independent linear transformation embedding -> N Q-values.
wushilan
@wushilan
Thank you for your detailed answers, and thank you for contributing such an outstanding work to Tensorforce.