by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 21 08:19

    AlexKuhnle on master

    Fix TensorforceAgent docstring … (compare)

  • Sep 19 11:16

    AlexKuhnle on 0.6.1

    (compare)

  • Sep 19 11:14

    AlexKuhnle on master

    Version 0.6.1 (compare)

  • Sep 10 20:22

    AlexKuhnle on master

    Various improvements and additi… (compare)

  • Aug 30 16:48

    AlexKuhnle on 0.6.0

    (compare)

  • Aug 30 16:45

    AlexKuhnle on master

    Fix version 0.6.0, clean setup … (compare)

  • Aug 30 14:34

    AlexKuhnle on master

    Improve SavedModel saving forma… (compare)

  • Aug 26 16:28

    AlexKuhnle on master

    Various improvements and change… (compare)

  • Aug 23 21:10

    AlexKuhnle on master

    Update last commit: add act-exp… (compare)

  • Aug 23 21:09

    AlexKuhnle on master

    Add act-experience-update examp… (compare)

  • Aug 23 18:07

    AlexKuhnle on master

    Improve Tensorboard summaries, … (compare)

  • Aug 22 13:54

    AlexKuhnle on master

    Fix and improve DPG agent, more… Missing in last commit: add new… (compare)

  • Aug 17 20:50

    AlexKuhnle on master

    Add deterministic argument for … (compare)

  • Aug 14 12:59

    AlexKuhnle on master

    Apply sampling for stochastic p… (compare)

  • Aug 13 15:27

    AlexKuhnle on master

    Remove internal optional value … Major change in policy/baseline… (compare)

  • Aug 03 12:35

    AlexKuhnle on master

    Correct Python version compatib… (compare)

  • Aug 03 12:06

    AlexKuhnle on master

    Fix requirements version proble… (compare)

  • Aug 03 11:21

    AlexKuhnle on master

    Improve signature and singleton… (compare)

  • Aug 01 15:37

    AlexKuhnle on master

    Align numpy version and TF requ… (compare)

  • Aug 01 12:19

    AlexKuhnle on master

    Improve quickstart example and … (compare)

Alexander Kuhnle
@AlexKuhnle
Good :-)
Isaac0047
@Isaac0047
@AlexKuhnle Hey Alex, I got confused on a simple question when defining the environment actions. You mentioned on May 7th reply it's better to normalize the action ranging between -1 and 1, but what if I have 3 potential actions to take? Will the code automatically assign the action to be 3 even spaced float numbers? What is the logic behind constraining the range of actions value? Besides, if I just define actions to be 'dict(type='int', num_values=3)', will the 3 actions values be 0, 1 and 2?
Alexander Kuhnle
@AlexKuhnle
Hey. Normalizing actions is referring to float/continuous actions, although more of it will happen automatically in recent updates, if min_value and max_value are known (which is definitely recommended to provide whenever possible). For int/discrete actions, the actions will always be 0, 1, ..., num_actions - 1, and it is for the environment to map it accordingly (e.g. simply via ACTION_ALTERNATIVES = [a1, a2, ...]; action = ACTION_ALTERNATIVES[action]).
Isaac0047
@Isaac0047
So in this case, do we need to explicitly define values for each action or the code would automatically assign action values based on some conditions?
If only giving the range and number of actions, I suppose the code should be able to identify values for each action by itself?
Alexander Kuhnle
@AlexKuhnle
What do you mean by "assigning action values based on some conditions"? And regarding range and number of actions: you should either have continuous actions with range dict(type='float', min_value=-1.0, max_value=1.0), or discrete actions with number of actions dict(type='int', num_values=4). The former will produce actions like 0.2376 and -0.712, whereas the latter will produce actions like 0, 2, 3. Does that help?
Isaac0047
@Isaac0047
Alex, I think I am confused on, let's say I have 4 actions and assigned with dict(type='int', num_values=4), do I need to explicit assign action_1=0, action_2=1, action_3=2 and action_4=3 when defining the actions? If simply using dict(type='int', num_values=4), how does the code know what is the exact value for each action?
since I want to design the reward as a function of actions values, so I want to know how the code determine exact value for each action
Alexander Kuhnle
@AlexKuhnle
The actions in the int case will just always be 0, 1, ..., N with N = num_values - 1. Note that the agent doesn't learn them as "numbers", but just as discrete choices, so it doesn't understand that 1 is between 0 and 2. You can use these integer action outputs directly, or you can map them to any other set of discrete values, for instance 0 could refer to "go left", 1 to "go right", etc... The agent will of course learn the relation between these action options via their effect on the environment, but it's important to emphasise that the numeric value of 0, 1, 2,... has no meaning to the agent, they're just a set of alternative options. Hope that clarifies it?
Isaac0047
@Isaac0047
Got it, thank you Alex
Thomas Pintaric
@pintaric
Quick question: How will Tensorforce (in particular the "auto" network) handle multiple observation subspaces (defined as gym.spaces.Tuple or gym.spaces.Dict)? Will it just flatten them to a single observation vector? Is there an easy way to define a hybrid network (e.g. a stack of layers with a CNN only for one particular subspace, then an MLP for the output and other subspace(s))...?
Alexander Kuhnle
@AlexKuhnle
Hi @pintaric, these spaces will be kept as multi-state dict, so they require a non-sequential network architecture in the style as you describe. The auto network is automatically configuring a simple such network: two layers per input, then concatenate all embeddings and apply another dense layer, before passing it to the distribution (see also the config options for "auto" to change depth/sizes). Alternatively, you can define your own "multi-input" network using the special register and retrieve layers, as outlined here.
1 reply
bennyfri
@bennyfri
How do I access the state action values? I have an agent (tried both ppo and a2c) that after learning, keeps on returning the same action for each step (could be a different action every time it learns from scratch). I suspect the values are going to infinity, but I want to validate that. How do I access the values of the actions in a specific state?
Alexander Kuhnle
@AlexKuhnle
This is not easily possible right now, but you could see whether plotting the "distribution" label in Tensorboard (see summarizer argument here) shows anything interesting.
Alternatively, you could dig into the code, but returning additional values beyond the act/observe function interface is not straightforward.
bennyfri
@bennyfri
Thanks. Anything special that I need for distribution? It does not seem to record anything. I can see loss for example, but not distribution label.
Alexander Kuhnle
@AlexKuhnle
Hmm, if it doesn't record anything, I need to check
Isaac0047
@Isaac0047
@AlexKuhnle Hi Alex I have a quick question. While using tensorforce, I occasionally got the error saying --- "InvalidArgumentError: Finite gradients check. : Tensor had NaN values [[{{node agent.observe/agent.core_observe/agent.core_update/agent.optimize/optimizer.minimize/optimizer.step/VerifyFinite_4/CheckNumerics}}]]". One interesting thing is, this never happens when I run the code for the first time, only occurs when running the code for the second and third time. Once this Tensor NaN value happens, I restart the kernel and it disappears again. Do you know what is going on and how to resolve it? FYI, I am using Python 3.6, tensorflow-gpu 2.0.0 and tensorforce 0.5.5, and this issue appears even for all the examples you provided in github.
Steven Tobias
@stobias123
Probably a dumb question... How can I ensure my agents are loading a saved checkpoint?
I have 5 agents running, all saving to the same checkpoint directory.... When I start a new job how can I confirm they loaded the checkpoints? My tensor board summaries seem to reset every time i deploy a new batch of agents
Alexander Kuhnle
@AlexKuhnle
@bennyfri I cannot confirm the problem with the distribution summary label. Which version of Tensorforce are you using, pypi or the latest master? And can you post your agent config / summarizer config?
5 replies
Alexander Kuhnle
@AlexKuhnle
@Isaac0047 , really, this happens for the CartPole examples? Interesting... This came up every now and then recently, and I'm not too sure what's going on. Maybe this is due to some TF-internal problem in early TF2 versions? But in some situations it likely was due to rare very large input values -- and I think they don't need to be all that large (100-1000) to start causing problems. So my recommendation is definitely to make sure the input values are bounded. In fact, I'm working towards making this more or less default and implicit in version 0.6, so that bounded values are automatically normalized, and unbounded spaces print a warning (worth switching to the Github master 0.6 beta version if these problems bother you). However, I've never experienced it with the examples in the repo.
Alexander Kuhnle
@AlexKuhnle
@stobias123 , how are you saving the agents, and how are you expecting them to load? Plus, how are these 5 agents supposed to coordinate the checkpoint saving? There are two save/load mechanisms in Tensorforce, one explicit (agent.save()/load()), and one implicit based on TensorFlow's CheckpointManager. The first one should be quite straightforward and definitely work, the latter has recently been added to replace the old tf.Saver -- if something seems to not work well with it, I may have to take a closer look whether there's anything wrong.
thachhoang1909
@thachhoang1909

Hi @AlexKuhnle ,
I defined the agent network with Conv2d as below.
network = [ [ dict(type="retrieve", tensors=["observation"]), { "type": "keras", "layer": "Conv2D", "filters": 32, "kernel_size": 3, "strides": 1, "activation": "relu" }, dict(type='flatten'), dict(type='register', tensor='obs-embedding') ], [ dict(type='retrieve', tensors=['attributes']), dict(type='dense', size=64, activation="relu"), dict(type='register', tensor='attr-embedding') ], [ dict(type='retrieve', aggregation='concat', tensors=['obs-embedding', 'attr-embedding']) ] ] critic_optimizer = dict(type="adam", optimizer="adam", learning_rate=3e-3) agent = Agent.create(agent='a2c', network=network, states=states, actions=actions, max_episode_timesteps=max_episode_timesteps, memory=memory, update_frequency=update_frequency, horizon=horizon, batch_size=batch_size, variable_noise=variable_noise, critic_network=network, entropy_regularization=entropy_regularization, exploration=exploration)
But it yield the 'Operation' object is not iterable error

File "C:\Users\TT\Anaconda3\envs\new_tensorforce_2\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1180, in cond
    return cond_v2.cond_v2(pred, true_fn, false_fn, name)
  File "C:\Users\TT\Anaconda3\envs\new_tensorforce_2\lib\site-packages\tensorflow\python\ops\cond_v2.py", line 92, in cond_v2
    op_return_value=pred)
  File "C:\Users\TT\Anaconda3\envs\new_tensorforce_2\lib\site-packages\tensorflow\python\framework\func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "d:\thach\fun\tensorforce\tensorforce\core\models\tensorforce.py", line 885, in apply_variable_noise
    with tf.control_dependencies(control_inputs=assignment):
  File "C:\Users\TT\Anaconda3\envs\new_tensorforce_2\lib\site-packages\tensorflow\python\framework\ops.py", line 5359, in control_dependencies
    return get_default_graph().control_dependencies(control_inputs)
  File "C:\Users\TT\Anaconda3\envs\new_tensorforce_2\lib\site-packages\tensorflow\python\framework\func_graph.py", line 347, in control_dependencies
    for c in control_inputs:
TypeError: 'Operation' object is not iterabl

Hope to hear from you soon,
thank you

Pedro Chinen
@chinen93
Hi, I'm looking for a way to sample the action instead of using the argmax. Is it possible? I want to know the percentages of each action. I'm using the 0.5.5 version. I tried to use the "deterministic" "independent" arguments in the Actor.act but without success. Is there a way? Should I get the Tensorflow model and use it directly? Thanks :)
Alexander Kuhnle
@AlexKuhnle
@thachhoang1909 That is a bug when using variable_noise, will be fixed soon but may take 1-2 days since I'm working on a change. In the meantime, you could just disable this feature (variable_noise = 0.0).
@chinen93 For most agents, this should be the default (only the Q-learning family doesn't sample, if I remember correctly). deterministic and independent both imply not sampling, I think (definitely deterministic, ofc). What agent are you using, and why do you think it doesn't sample (unless you use DQN or so)?
Pedro Chinen
@chinen93
I'm using PPO, in fact I just wanted to see the probabilities of each action.
Alexander Kuhnle
@AlexKuhnle
In version 0.5.5, it should be possible to get these via the query argument of agent.act(), something like query=['action-probabilities'], or replace action with the action name you're looking for. Does that work? (It should be possible to check what query vectors are available via agent.get_query_tensors(function='act').) However, as it currently stands, this feature won't be available in 0.6 anymore, need to figure out what to do or whether Tensorboard plotting is enough.
thachhoang1909
@thachhoang1909
Hi @AlexKuhnle,
thanks for your response. I can run it now.
I want to load act-only model to run on edge machine, the checkpoint model seems too large (in storage).
I save and load model by saved-model but I faced another problem when loading saved-model format, I can't load the saved model.
Below is the assertion error and the code save/load model
Agent.save('model_output', format='saved-model')
Agent.load('model_output', format='saved-model')
    636                     format = 'checkpoint'
    637                 else:
--> 638                     assert format == 'checkpoint'
    639                 if filename is None or \
    640                         not os.path.isfile(os.path.join(directory, filename + '.index')):

AssertionError:
Alexander Kuhnle
@AlexKuhnle
@thachhoang1909 The saved-model part is one of the new features in version 0.6, and unfortunately not quite figured out yet. Right now, there is no support from loading SavedModels from within Tensorforce. However, it is possible to load and use them, as illustrated e.g. here. Generally, I would use the other saving formats while you work in Python/Tensorforce, since there should be little if any benefit of using the SavedModel version, and the SavedModel is for when you want to use the trained model somewhere else for deployment. I haven't considered your point, though, where you're happy to use Python/Tensorforce, but you want to reduce memory requirements. It should be possible to provide an act-only agent again, as there was in previous versions.
Two questions: what agent are you using? and you're saying that the checkpoint format is too big, but when you save it in saved-model format, the size is okay?
thachhoang1909
@thachhoang1909
@AlexKuhnle , Thanks for your response.
As currently I just got the saved-model format (~290MB) it's really big. I use A2C model with network=critic_network = [conv2d(3x3x32), conv2d(2x2x16), dense(128)] with input (11,15,6), and 6 discrete actions. I still wonder why it's so big.
Alexander Kuhnle
@AlexKuhnle
Okay, so the SavedModel is big as well... that's expected and due to a problem which hasn't yet been solved: for some reason, despite specifying only the act function to export, the exported SavedModel contains the full model currently. Need to understand more how SavedModel export works and how this can be addressed.
But good to know that there is interest... I will give it a closer look very soon. If you have some experience with SavedModel and the related TensorFlow API, maybe you know what should be done differently (relevant code)?
Alexander Kuhnle
@AlexKuhnle
@thachhoang1909 later than hoped, but the problem with the SavedModel files being too big should now be solved. See also the new SavedModel example script.
Pedro Chinen
@chinen93

Hi, I just upgraded my Tensorforce version to 0.6 and tried to run the cartpole environment in Colab with a GPU but couldn't. Is this the right place to post it? or should I create an issue in the github?

pip install Tensorforce==0.6

from tensorforce.execution import Runner
runner = Runner(
    agent=dict(
       type="ppo",
       batch_size=2
    ),
    environment=dict(environment='gym', level='CartPole'),
    max_episode_timesteps=500
)
runner.run(num_episodes=20)
runner.run(num_episodes=10, evaluation=True)
runner.close()

I Got the following messages:

InvalidArgumentError: Cannot assign a device for operation agent/StatefulPartitionedCall/agent/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node agent/StatefulPartitionedCall/agent/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Identity: GPU CPU XLA_CPU XLA_GPU 
ResourceScatterAdd: CPU XLA_CPU XLA_GPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ResourceGather: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  agent_929 (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  agent/StatefulPartitionedCall/agent/Gather (ResourceGather) 
  agent/StatefulPartitionedCall/agent/ResourceScatterAdd (ResourceScatterAdd) 
  Func/agent/StatefulPartitionedCall/input/_90 (Identity) /job:localhost/replica:0/task:0/device:GPU:0

     [[{{node agent/Gather}}]] [Op:__inference_act_1063]
Alexander Kuhnle
@AlexKuhnle
Hmm, no idea why this is, so maybe better to open a Github issue. Quick fix: you could just turn GPU off, since performance benefits are probably not huge for RL anyway?
MarkTension
@MarkTension

Hi all, I got a question. I'm getting this error:

File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/layers/embedding.py", line 94, in initialize
    condition='input num_values is None'
tensorforce.exception.TensorforceError: Required Embedding argument num_embeddings given input num_values is None.

when initializing my agent like so

agent = Agent.create(
    agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3
)

The environment I'm using is a custom environment, so I'm guessing the cause lies in my implementation? I followed the docs. Is this a common problem? Any help in where I can start looking for where the error comes from?
If it helps here is how I define my environment . Thanks in advance

Alexander Kuhnle
@AlexKuhnle
Hey, the reason for this exception is that your state space specification is of type int, but missing num_values here. Why: the default network embeds discrete inputs and hence needs to know how many embeddings are required, and there is currently no default support for arbitrary integers (which is a non-trivial problem). By the looks of it, the state integers encode the type of each grid tile, so I assume there is a fixed set of types, which would be the value to choose for num_values.
(But the exception is not great, so will improve this, and maybe add something to the docs on int types.)
MarkTension
@MarkTension
Great thanks, that solves it:)
MarkTension
@MarkTension

Hi! I'm running into this error

  File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/layers/dense.py", line 87, in initialize
    is_trainable=self.vars_trainable, is_saved=True
  File "/Users/schmark/anaconda/envs/tensorforce/lib/python3.6/site-packages/tensorforce/core/module.py", line 511, in variable
    name='variable', argument='spec', value=spec, hint='underspecified'
tensorforce.exception.TensorforceError: Invalid value for variable argument spec: TensorSpec(type=float, shape=(0, 32)) underspecified.

Since I added the network argument and my own custom layers:

agent = Agent.create(
    agent='ppo',
    environment=environment,
    network=[
        dict(type='conv2d', window=5, stride=3, size=8, activation='elu'),
        dict(type='flatten'),
        dict(type='dense', size=32),
        dict(type='flatten', name="out"),
    ], #etc (extra flatten is probably not necessary)

What does underspecified mean in this case, and what can be a cause?

Alexander Kuhnle
@AlexKuhnle
Don't see an obvious problem. What's the state space specification here?
MarkTension
@MarkTension
It’s a float array of shape [32, 32]
def states(self):
return dict(type='float’, shape=(self.params.egoSize,self.params.egoSize))
Alexander Kuhnle
@AlexKuhnle
I think it could potentially be because a conv2d expects rank-3 inputs, so of the shape (x, y, c). Could you try whether (32, 32, 1) works? (In which case it's obvs again not a great exception message)
MarkTension
@MarkTension
That's indeed the cause. Thanks again!
Drew Robinson
@l0phty

Hi folks. I'm new to Tensorforce and ML in general and am running into an error using TensorForce I'm not sure how to debug:

  File "/Users/marco0009/.virtualenvs/puzzle_solver-j2SM-PkM/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  indices[0] = [0, 3000] does not index into shape [1,3000,6]
     [[{{node agent/StatefulPartitionedCall/agent/TensorScatterUpdate_1}}]] [Op:__inference_act_1218]

3000 I recognize as my custom Environment's max_step_per_episode and the 6 I suspect is related to my env's actions:

    def actions(self):
        return {
            "make_move": dict(type="int", num_values=6),
        }

but I'm unsure as to what the cause of this exception actually is. Is there anywhere I should be looking in my Environment's configuration for issues?

Drew Robinson
@l0phty
I think I managed to find the problem. Instead of instantiating my custom env class and passing that to my agent, calling Environment.create and passing my custom class in that to it seems to have fixed it.