Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Mahmut Bulut
    @vertexclique
    hey @gvgramazio , thanks for the reply. Yes, I am testing in the same script, right after training with a different dataset. Also tried with the same dataset I used to train, but also repeated the same actions.
    gvgramazio
    @gvgramazio
    @vertexclique Except for the new environment, are you using keras-rl without substantial modifications? Could you link the script you're using? If it would have performed bad both in training and testing a lot of things could have caused that but it is very strange that it performs bad only in the testing phase.
    @vertexclique I noticed now that you mentioned the word dataset. What you mean by that? DDQN is an algorithm for RL, not for supervised learning.
    Mahmut Bulut
    @vertexclique
    I am putting some data to set env properly. Different data means different env. Interestingly setting up test_policy to BoltzmannQPolicy helped me to throw not repetitive actions.
    I will send the notebook to you if you want.
    gvgramazio
    @gvgramazio
    @vertexclique What were you using before instead of BoltzmannQPolicy? Anyway, if you have a github/gitlab repository you could post here the link and I can take a look at your code as well as continue the discussion there.
    jarlva
    @jarlva
    Has anyone tried AutoML tools (e.x. Auto-Keras) to improve model/parameters? Is it possible to do?
    yujia21
    @yujia21
    Hi, quick question: in lines 141-146 of core.py (random start steps at the start of each episode for fit function of agents), the action sampled from the environment is processed by the processor, then fed into the step function of the environment. From what I understand the processor should not be applied to the action sampled from the environment as it should already belong to its action space, but applied to the action that the agent outputs (as in lines 169-171). Is this a bug? @RaphaelMeudec @mirraaj
    Stuart Farmer
    @StuartFarmer
    Question- training and testing a DQN in a single run works fine and visualizes properly
    loading weights into a DQN and testing causes no output and action = 0 always
    how do I get around this?
    Stuart Farmer
    @StuartFarmer
    Hey everyone I figured it out
    DQN Agent takes a policy object. If you do not pass the same policy as the test_policy object as well, the model won't work when you load the weights
    gionic
    @giorgionicola
    Hi, I am trying to load the weights of a DDPGAgent previously trained. Although when I test the model does not behave correctly, it seems like the weights are random. What am I missing?
    JaMesLiMers
    @JaMesLiMers
    Hi everyone, I think I found a little bug when I was trying to use ddpg with my custom Processor class. The way that ddpg add metrics and metric_names in Processor is actually adding the function itself into a list, which will cause "TypeError: 'method' object is not subscriptable", hope you guys will see this note and have a check, you may just add "()" at the 228 line "names += self.processor.metrics_names[:]" and 299 line "metrics += self.processor.metrics"
    jamesm-vt
    @jamesm-vt

    Hi all. I just forked this repo to work on some contributions. I just ran the tests to verify my environment before starting and one test consistently fails:

    [gw0] [ 83%] FAILED tests/rl/agents/test_dqn.py::test_naf_layer_full
    Replacing crashed worker gw0

    Thought I'd check if anyone else was seeing the same before I dive into this.

    Yu Jia Cheong
    @yjcheong_gitlab
    @JaMesLiMers if the base class of your processor is the Processor defined in rl/core.py, then metrics_names and metrics have the @property decorator, so self.processor.metrics returns a list and not a function (idem for metrics_names). If you overwrite metrics_names and metrics in your custom class add in the @property decorator and it should work
    jamesm-vt
    @jamesm-vt
    @jamesm-vt It was an environment issue. Resolved now.
    Yahya
    @John-Almardeny
    Hi All, just a quick question please
    What is the purpose of the new episode in the while loop in fit()? it just pass a zero reward to the agent
    so in other words, what is the intuition of passing a zero reward and performing a back-propagation on a zero reward?
    Ryan Kitson
    @rckitson
    @jamesm-vt what was the environment issue? i'm getting the same message
    Ryan Kitson
    @rckitson
    @rckitson export KMP_DUPLICATE_LIB_OK=TRUE resolved it
    maybeliuchuan
    @maybeliuchuan
    Hi all. Sorry to bother you. I have installed all the libraries needed to run the examples, and I meet a AttributeError when I try out the simple example: python examples/dqn_cartpole.py. The error content is as follows:Tensor.op is meaningless when eager execution is enabled. What should I do? Thanks Tensorflow:2.0.0-beta kerals-rl: 0.4.2
    Stefan Schneider
    @stefanbschneider
    I don't think keras-rl works with tf2. I'm using tf1.15 and it works fine. There seems to be a version for tf2 as well: https://github.com/wau/keras-rl2
    maybeliuchuan
    @maybeliuchuan
    Sorry to bother you again. I have met the same error when I use keras-rl2 with tf2.0.0b1. Whether I have to change the version of tf to tf1. with keras-rl or not?
    Stefan Schneider
    @stefanbschneider
    idk, I'd just try with a lower version. there are a lot of deprecated tf functions used in keras-rl - probably, they just don't work with tf2 anymore
    maybeliuchuan
    @maybeliuchuan
    Thanks a lot! I now try a lower version
    maybeliuchuan
    @maybeliuchuan
    Hello, there is another problem please!
    I meet another typeerror when I use tf1.5 like you: TypeError: len is not well defined for symbolic Tensors. (activation_4/Identity:0) Please call x.shape rather than len(x) for shape information.
    This is a question I have met before. But when I use tf2.0.0-beta, it disappeared. However, new error is what I asked before.
    Is there something wrong I have done?
    Matthew Pocock
    @drdozer
    Hi - I don't know if this is the right room to ask in.
    We're training up an autoencoder. We feed inputs that are labelled as real or synthetic
    In the latent space, we have a discriminator that tells real from synthetic
    we then in the output space compare the input with the reconstruction.
    We'd ideally like the reconstruction loss function to only look at the real examples, and ignore the synthetic ones, since we don't care if the autoencoder can reconstruct synthetic examples well or not. We want it to spend all its effort reconstructing real ones accurately.
    But were not sure how to adjust the loss function to achieve this.
    Le o
    @ChipChap1_gitlab
    Hi everyone
    Would somebody know if it is possible to train a Deep Q-Network incrementally in an online setting where new observations come in a stream with a time delay? Concretely, I am looking for something like the "train_on_batch"-function in Keras (in the online case with batch =1)
    Jannes Klee
    @jannesklee
    Hi, does someone knows if the package is still maintained? There are lots of pull requests, which are not accepted, that are fixing many of the above mentioned issues.
    Stefan Schneider
    @stefanbschneider
    I don't think it is
    Grégoire Passault
    @Gregwar

    Hello,
    I have a question about this part of code:
    https://github.com/keras-rl/keras-rl/blob/master/rl/core.py#L201

    Why reaching the number of maximum steps allowed trigger a terminal state?

    Shouldn't it just reset the environment and keep the original returned done
    This matters because suppose in DDPG, the fact that the state is terminal or not has an effect on the estimation of Q
    Stefan Schneider
    @stefanbschneider
    I think it makes sense that the framework returns done=True before resetting an environment. If you reset in between (ie, without done=True), the env will suddenly look completely different (eg, when the cartpole is reset) and it will be impossible/very hard for the agent to learn why
    Grégoire Passault
    @Gregwar
    In the current case, the environment seems to be terminal when it's actually not, which is also not a good solution in my opinion
    Resetting the environment and reaching a terminal states should be two very different things
    In experience replay buffers approaches, that just means that you don't store the transition from a state that is reached after env. is reset which I think is already the case
    (What I meant by resetting is doing it in keras-rl itself not in the env.)
    marcus1337
    @marcus1337
    hi, anyone online?
    Stefan Schneider
    @stefanbschneider
    Hi, has anyone experience with graph neural networks and knows whether/how it's possible to combine them with keras-rl? I have a graph that I would like to pass as input/state to my RL agent and thought using a GNN would be most natural.
    Jan
    @jan-polivka

    Hi, I've ran into an issue while trying to use the fit() function on a SARSAAgent class. The issue appeared in callbacks.py in the function on_step_begin. Specifically, the assert for metrics.shape was not working due to the datatype (comparison of tuples (x,) and (y,z)) and then for np.isnan(metrics).all() as np couldn't handle metrics safely due to the wrong data type.

    I've fixed it for myself but I was wondering if I should make an issue and do a pull request. It's probably not well done but I'm sure some Python guru can refactor it in a flash.

    I'm using Python 3.8.7 and Numpy 1.19.5. Keras-rl should be the most recent one, I've pulled it yesterday.

    Jan
    @jan-polivka
    Alright, say no more, I'll do it.
    yunpeng-ma
    @yunpeng-ma
    Hello, does anyone have experience using DDPG to control multiple actions? I built my own scripts but the actor neural network always output the same value. I thought it might be something wrong in my code which I don’t know. Then I tried to use the keras-rl, I see it’s written in the ddpg.py that “Actor "{}" has more than one output. DDPG expects an actor that has a single output.” Does that mean we can only use DDPG to control one action? Any advice is appreciated