Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Tom Breloff
@tbreloff
(i.e. MDP vs POMDP?)
Zachary Sunberg
@zsunberg
yeah
that is one way to describe it
anyways, I wrote algorithm A that depends on step!(env, s, a) being markov wrt s
Tom Breloff
@tbreloff
yeah so I think this api satisfies all needs, but it would be good to explicitly state that the s is really an o (observation) when the environment is partially observable
Zachary Sunberg
@zsunberg
then someone could throw an ENV problem at algorithm A, and they would think it was being solved
but it wouldn't actually solve it
so it would be nice to have some way to check or verify that a problem is actually a GM problem
for people who write algorithms
Tom Breloff
@tbreloff
could add an informative api to state that
like: ismdp(env)
Zachary Sunberg
@zsunberg
right
yeah
Tom Breloff
@tbreloff
default it to false
Zachary Sunberg
@zsunberg
I think that would be adequate
Tom Breloff
@tbreloff
any interest in contributing that?
should be quick
Zachary Sunberg
@zsunberg
Sure
Zachary Sunberg
@zsunberg
Also, how are you expecting people to manage random number generation? just embedding an rng in the environment if they want? (that seems like a reasonable way to do it to me)
Tom Breloff
@tbreloff
yeah... people can do anything they want with the environment
though i could see an api call to generically set an rng seed
(lets hold off on that for now)
Zachary Sunberg
@zsunberg
yeah
Tom Breloff
@tbreloff
hey zach.. just curious... are you planning to update the readme with info about MDP vs POMDP?
Zachary Sunberg
@zsunberg
I could - should I
?
Tom Breloff
@tbreloff
also, you should do a git fetch; git rebase origin/master before submitting any PRs
to make sure you replay changes on my recent commits
Zachary Sunberg
@zsunberg
Ok, yeah, actually I am in the middle of some other work, so I probably won't get to it today
Tom Breloff
@tbreloff
ok... then i'll just throw it in quickly
wanted to give you the chance :)
Zachary Sunberg
@zsunberg
cool
yeah sure thanks
I'm still pretty slow at all this open source stuff, haha
I just wanted to make sure you understood what I was saying about the state, and it's clear that you do - I definitely think that this will be good interface to have for us all to work with
Tom Breloff
@tbreloff
yeah i like to give people a chance to practice... another time
Tom Breloff
@tbreloff
JuliaML/Reinforce.jl@3f82b06
Zachary Sunberg
@zsunberg
cool - thanks for taking time to discuss it - I should be able to make some glue between POMDPs.jl and Reinforce.jl so that they are pretty easily interoperable
Tom Breloff
@tbreloff
that would be awesome
Maxim Egorov
@etotheipluspi
I'm coming over from POMDPs.jl, and was wondering what your plans are for adding new algorithms. Everyone is waiting for deep reinforcement learning implementations in Julia :D
Tom Breloff
@tbreloff
Hi @etotheipluspi I will need some help to add new algos. There are a couple policies that might give you some good ideas of how to approach it. A3C would be a great one to start with.
I don't have time to build out new functionality
Patrick Kofod Mogensen
@pkofod
If
Mark Saroufim
@msaroufim
Hey everyone, I wrote an intro to RL post based on reading the codebase of Reinforce.jl - hope you like it and are willing to share feedback https://medium.com/@marksaroufim/how-to-implement-a-reinforcement-learning-library-from-scratch-a-deep-dive-into-reinforce-jl-d1ec2a239924
Mark Saroufim
@msaroufim
on the front page of HN XD
@tbreloff
Maxwell Peterson
@allswellthatsmaxwell
In which function should I implement a temporal-difference-style update? I thought step! might be where to do it, but it doesn't take a subtype of AbstractPolicy as an argument, and I've got my value function in my Policy type. And doing it with the s, a, r, s' returned from run_episode doesn't work, because I want to update during the episode. I've been looking through code files but just cannot seem to find an update on a policy.
Maxwell Peterson
@allswellthatsmaxwell
Figured it out - it goes outside any of those, in some loop where I'm calling step! and action etc. etc.
Khizir Siddiqui
@khizirsiddiqui
Hi! Are there any issues a beginner could take?
Was looking to start contributing in Julia.