Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 15 08:49

    AlexKuhnle on master

    Add not-maintained message to r… (compare)

  • Nov 21 2022 20:42

    dependabot[bot] on pip

    (compare)

  • Nov 21 2022 20:42

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Jul 29 2022 23:31

    dependabot[bot] on pip

    Bump mistune from 0.8.4 to 2.0.… (compare)

  • May 24 2022 17:30

    dependabot[bot] on pip

    Bump tensorflow from 2.8.0 to 2… (compare)

  • Feb 10 2022 08:43

    dependabot[bot] on pip

    (compare)

  • Feb 10 2022 08:43

    AlexKuhnle on master

    Bump tensorflow from 2.7.0 to 2… Merge pull request #855 from te… (compare)

  • Feb 09 2022 23:35

    dependabot[bot] on pip

    (compare)

  • Feb 09 2022 23:35

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Feb 09 2022 23:28

    dependabot[bot] on pip

    Bump tensorflow from 2.7.0 to 2… (compare)

  • Jan 08 2022 21:53

    AlexKuhnle on master

    Correct type (compare)

  • Jan 08 2022 21:41

    AlexKuhnle on master

    Add missing box2d dependency (compare)

  • Jan 08 2022 16:56

    AlexKuhnle on master

    Downgrade numpy version for Py3… (compare)

  • Jan 08 2022 16:51

    AlexKuhnle on master

    Update to TF 2.7, update depend… (compare)

  • Jan 03 2022 16:15

    AlexKuhnle on master

    Update setup and travis config (compare)

  • Dec 29 2021 14:54

    AlexKuhnle on master

    make states ArrayDict to pass a… Merge pull request #849 from dx… (compare)

  • Nov 10 2021 20:00

    dependabot[bot] on pip

    (compare)

  • Nov 10 2021 20:00

    AlexKuhnle on master

    Bump tensorflow from 2.6.0 to 2… Merge pull request #840 from te… (compare)

  • Nov 10 2021 19:45

    dependabot[bot] on pip

    Bump tensorflow from 2.6.0 to 2… (compare)

  • Oct 20 2021 20:50

    AlexKuhnle on master

    Update gym version requirement (compare)

Hadi Keramati
@hkeramat
Hi, I used multi-actor method based on the multi-actor example in the repo. I'm a bit confused about the implementation though. My framework is up and running, but I guess it's just having larger action and state array and single agent. Is there a possibility to have multi-agent with different policy?
Milind Dalvi
@mdalvi
Hello Everybody, I'm new to Tensorforce. Looking forward to using the library for some experiments and contributing to the community. Thank you.
Alexander Kuhnle
@AlexKuhnle
@hkeramat, the multi-actor setup in the repo is indeed using the same agent to simultaneously control multiple players (it will still run each player independently). The reason why this is provided is because it may not be straightforward how to achieve this with Tensorforce. Setting up a version with different agent per player, on the other hand, should be more straightforward.
9 replies
@mdalvi Welcome! :-)
Milind Dalvi
@mdalvi
image.png
2 replies
Is it possible to view a tensorforce agent as keras plot_model kind output?
Milind Dalvi
@mdalvi
There is some kind of leakage in the library. This can be spotted if you run the temperature controller example multiple times. I added the code segments agent.close() and environment.close() after the agent was trained and initialized as a new agent via Agent.create. Surprisingly the new "untrained" agent behaves like a trained agent. @AlexKuhnle Let me know if you need a sample notebook for demonstration.
1 reply
Alexander Kuhnle
@AlexKuhnle
Can't remember, but is the notebook using a saver configuration? Because if so, it probably by default saves the model at some point and then when initialising the agent loads the model as specified in saver.
Christina Schenk
@schenkch
Hi all, I am trying to set up a multi-objective problem. The objective is a trade-off of two things: minimizing one and maximizing the other one. I tried a weighted sum approach with several versions of the objective but so far without success. Now, I was thinking about treating them separately and using a 2D reward function. However, it seems that this is not supported yet. Do you have any suggestions on alternatives or getting this to work? Thank you!
Axel Vulsteke
@Axxeption
image.png
Hi, I want to save my best models during training using the save_best_agent parameter, but it does not run and I do not see why.
I created my own environment. I tried already in parallel execution and serial without success.
Do I miss something extra I need to set before it starts writing out the best model?
Alexander Kuhnle
@AlexKuhnle
@schenkch , unfortunately as you identified multi-reward is not supported. Alternative would be a combination like a weighted sum or similar, but since you've tried that already, not helpful. :-/
1 reply
@Axxeption , is might be that you need to use evaluation=True for runner.run(...). If you have more than one parallel environment, that should hopefully work. Not clear from the docs though.
Axel Vulsteke
@Axxeption
@AlexKuhnle if I put evaluation=True, it will not train anymore because it will always use the trained solution so far. Or did I understand this incorrectly?
To put differently: evaltution=True is what you do when you put the algorithm into production?
Thanks already for your knowledge sharing!
Alexander Kuhnle
@AlexKuhnle
@Axxeption, if you run multiple environments in parallel, then only one of the multiple parallel runs should be used as evaluation run, which will run throughout and will be used to determine the "best" agent. At least that's how it should work -- can you try? I assume you will realise quickly if training doesn't happen.
12jr
@12jr
Hi, could you please explain what the difference between the ac and the a2c agent is? In particular, I'm wondering if a2c is a vanilla Advantage Actor-Critic agent, what different baseline function does the ac agent use?
12jr
@12jr
Judging from the code, the main difference seems to be that a2c has estimate_advantage=True and ac sets this to false. However, I cannot make out what difference it makes
Alexander Kuhnle
@AlexKuhnle
AC in Tensorforce is a policy gradient actor and a value critic where horizon + critic's estimate are used for the PG update (instead of the unrolling and computing the actual episode return). A2C additionally uses not the state value estimate but the advantage estimate. So:
  • PG: discounted-sum(r_t, r_t+1, ...)
  • AC: r_t + disc * Critic(s_t+1) (for horizon=1, otherwise unroll discounted sum more)
  • A2C: (r_t + disc * Critic(s_t+1)) - Critic(s_t)
    I'm not entirely sure whether this is how AC vs A2C is usually defined.
khughes147
@khughes147
I've implemented a PPO agent and specified the input and output network layers however I'd like to have a better understanding of how the NN structure is defined when using 'network=auto'. Is there somewhere I can read up on this. I know I can use agent.get_architecture() to view to current arch, however I'm more interested in how it is auto-defined.
Alexander Kuhnle
@AlexKuhnle
Besides the docs which are probably not detailed enough for you, the best place is to look into the code directly, should be fairly readable.
Lucas Pereira
@LucasMSpereira
is it possible to use tensorforce on windows?
Alexander Kuhnle
@AlexKuhnle
I'm using it on Windows via the Ubuntu Subsystem (WSL). I seem to remember that I've also tried it on Windows directly at some point, but not sure.
Lucas Pereira
@LucasMSpereira
I had a ton of problems trying to get it to work. I have that subsystem as well. Which version of python and packages should I use? And can I base my workflow around vscode?
Alexander Kuhnle
@AlexKuhnle
What were/are the problems you encounter? Python version should be >=3.6. Minimum package versions should be specified in the requirements file so automatically installed correctly when installing this package. VSCode: I'm not using it myself, but I don't see why it wouldn't work?
Benno Geißelmann
@GANdalf2357
Hi @AlexKuhnle will there be a new release of tensorforce with the current tensorflow version 2.8.0 available on pypi? Thanks a lot!
Alexander Kuhnle
@AlexKuhnle
Hi @GANdalf2357, I'm currently not very active in maintaining the codebase, so it may take a while (particularly if it requires deeper changes). Any reason why the latest version would be desirable over the supported one?
Jackflyingzzz
@Jackflyingzzz
Hi @AlexKuhnle , I tried to use ddpg with a lstm/gru network but it trigger an assertionError with assert self.policy.max_past_horizon(on_policy=False) == 0. But with ppo it works fine, is ddpg not compatible with rnn? Many thanks!
Alexander Kuhnle
@AlexKuhnle
Indeed that means that RNNs are not compatible -- not in principle, but it hasn't been implemented. If you feel comfortable with it, you can replicate the DDPG configuration of the "parent agent type", the "tensorforce" agent, and then modify it to make it "on-policy" instead of "off-policy". But that's certainly getting into Tensorforce internals. Otherwise, any reason why not to use PPO?
ProgerDreamer
@ProgerDreamer

Hi, everybody! I have a problem with DDPG agent. If I run it with default critic (i. e. without critic) parameters it works OK, but if I try to run it with critic specification (network and optimizer), the error is raised:

...
File "/home/mikhail/RL_10_21/venv/lib/python3.8/site-packages/tensorforce/core/optimizers/tf_optimizer.py", line 173, in step
assert len(gradients) > 0
AssertionError

Does anybody have any ideas?

Mario Guerriero
@marioguerriero
Hi everyone. I am having issues training an agent on an Apple M2 machine. Whenever I enter my training loop I hit the following issue: InvalidArgumentError: Cannot assign a device for operation agent/VerifyFinite/CheckNumerics: Could not satisfy explicit device specification '' because the node {{colocation_node agent/VerifyFinite/CheckNumerics}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].. Full code and error message are here: https://pastebin.com/ktqYYSSv is anyone able to help?
1 reply
salehisaeed
@salehisaeed

Hi all,

I have a C++ code environment that I want to couple with Tensorforce. I want to load my agent model directly within C++. This way, after each action I don't have to leave the C++ code and go to Python and back again. Therefore, I can run a full episode inside the C++ program and record everything because the C++ has access to the model directly.

So far, I have successfully loaded the agent model inside my C++ code and verified its performance. The code runs an entire episode and records all the required data (states, actions, rewards, terminals). Then, I load the data in Python to update the model. My understanding is that I should use an "act-experience-update" interface. However, I do not really have any "act" function as all the actions have already been carried out inside the C++ code. So, calling the experience function using the recorded data as:

agent.experience(states=states, actions=actions, reward=rewards, terminal=terminals)

throws:

Exception has occurred: TensorforceError
Invalid value for SINGLETON argument value shape:  != (1,).
  File "/home/saeed/learning/training.py", line 55, in main
    agent.experience(states=states, actions=actions, reward=rewards, terminal=terminals)
  File "/home/saeed/learning/training.py", line 89, in <module>
    main()

The same error is seen if the pretrain function is employed.
Any idea? Has anyone done such coupling between C++ and Tensorforce successfully?

Thanks,
Saeed

1 reply
khughes147
@khughes147
Hi, is it expected that the input layer does not show up in Agent.get_architecture?
Ka Wing Kwok
@kwokkawinghk

Hi all,
I am using python 3.8 and trying to build the A2C agent network as below using the AutoNetwork function:

network_spec = dict(type='auto', size=256, depth=3, final_size=256, final_depth=2, rnn=256, inputs_spec=tf.TensorSpec(shape=(276,49), dtype=float))

However, it gives me the error:

TensorforceError: Spec mismatch for argument inputs_spec: TensorSpec(shape=(276, 49), dtype=tf.float32, name=None) != TensorsSpec(SINGLETON=TensorSpec(type=float, shape=(276, 49))).

Does anyone have ideas on how to fix that? Is there anything wrong on my inputs_spec value? many thanks in advance.

Martin Drašar
@drasha
Hi,
did anyone try to supply agents with correct solutions as part of their training? That is to fill their replay buffers with correct combinations of state+action+reward. I know it is not a part of an official API, but it seems it could be done in a way.
Thanks.
amirrezaheidari
@amirrezaheidari

Hi all,

I have a dataset of optimal control behavior. I would like to first train the agent on this dataset, and then start interacting with the actual system to continue its learning pocess. I think offline reinforcement learning or imitation learning can do it, but I am not really familiar with any of them. So I would appreciate it if someone can help me with the following questions:

1- Should I use offline RL or imitation learning to pre-train the agent on the dataset?

2- Is it possible to do pre-training on the dataset with tensorforce? any examples?

3- I need to deal with continuous action spaces, so I would like to use SAC agent. There is AC agent in tensorforce, is it the same as SAC?

Thanks

Martin Drašar
@drasha

Pretraining is possible, just check the documentation here: https://tensorforce.readthedocs.io/en/latest/agents/agent.html#tensorforce.agents.TensorforceAgent.pretrain

There are also code examples that are quite easy to follow.

lI-guy-Il
@lI-guy-Il
Where can I find tutorials for DDQN?
Martin Drašar
@drasha
General tutorials on how to use DDQN are all over the web. If you want to use one in Tensorforce, just use the general Agent.create(agent=DoubleDQNAgent, ...) interface and fill the parameters from the documentation: https://tensorforce.readthedocs.io/en/latest/agents/double_dqn.html
lI-guy-Il
@lI-guy-Il
How do I know what parameters to set?
Like I made something, it's just an incredibly stupid AI
clearly I messed something up
I had similarly garbage results with other libraries such as Keras-RL. So the issue is in the setup, not the AI itself
Martin Drašar
@drasha
Set the required ones to some reasonable values and tweak the others when results are unsatisfactory. Start with a smaller observation space first. Try using PPO, as it converges better than DQN. And consider pertaining. But all that really depends on your usecase.
khughes147
@khughes147
Does anyone have a problem with agent evaluation? During training the agent converges on a near-optimal reward and a correct sequence of actions are continually executed. The agent is saved with the Saver function regularly. Here is my own and another's examples where this is the case: https://stackoverflow.com/questions/69931069/evaluation-stage-of-tensorforce-ppo-not-performing-as-expected https://stackoverflow.com/questions/73035682/tensorforce-agent-training-with-custom-environment
amirrezaheidari
@amirrezaheidari
I have three continuous actions. I did not find any example how to deal with continuous action space, but If I am not wrong, I should specify continuous actions as below. But in my problem, each action has a different min and max, how should I specify it?
actions = dict(type='float', shape=(3,), min_value=-1.0, max_value=1.0)
tomseidel
@tomseidel
Hi all, merry Christmas :-) Would love to get some help on feature importance analysis. I thought using shapley based on the tensorflow model. However I am not sure how to save the agents model to be able to load that with keras as tf model!? Is that even possible? If not, what are other ways to get there? Thanks
tomseidel
@tomseidel
Ddpg
amirrezaheidari
@amirrezaheidari
Is there any way that agent.act is not foloowed by agent.observe but the action still is considered for training? I am using tensorforce as a component of a simulation software (TRNSYS) which means that the agent selects the action, then the action should go to the other components, and then agent can observe the state at the end of timestep. In this case the Python code that is included in that component should end with agent.oct (to send the action to the system) and then start with the agent.observe. So they cannot follow each other in my code
Elias Anderlohr
@elianderlohr
Hello everybody, has somebody ever encounter the following error: tensorforce.exception.TensorforceError: Dense input mismatch for argument rank: 2 != 1.? I would highly appreciate any help. I get the error when I try to use a custom network: e.g.: network = [dict(type="dense", size=32, activation="relu"), dict(type="dense", size=32, activation="relu")] for the PPO Agent. Full code: https://github.com/elianderlohr/kniffel/blob/main/src/tensorforce_rl/optuna_rl.py#L609
amirrezaheidari
@amirrezaheidari
Hello, does anyone knows how can I use an agent without the need for an environment definition? my agent is a part of a simulated system (in TRNSYS) that acts like a controller. It gives the action to the system and receives back the next state and reward. I am now using only agent.act and agent.observe but I am afraid that I am not aware of a dependency to the environment because the agent never converges