These are chat archives for ujh/iomrascalai

15th
Apr 2015
iopq
@iopq
Apr 15 2015 05:33
Older version of Mogo doesn't see the correct move either
Urban Hafner
@ujh
Apr 15 2015 06:34
OK, so it’s just that the bot is weak and can’t see the move.
BTW, I couldn’t see any difference between the two policies. Both won around 35% of their games against GnuGo (level 0).
iopq
@iopq
Apr 15 2015 07:38
let me give you the sgf
there's VERY few moves, it can see it but it doesn't think it does anything useful
iopq
@iopq
Apr 15 2015 11:25
I sent an email to the mailing list about it
Urban Hafner
@ujh
Apr 15 2015 11:27
Yes, I saw that. Let’s hope that someone has tried it. Maybe the difference is small enough that it won’t be picked up with 100 games.
iopq
@iopq
Apr 15 2015 11:29
well it made a difference in one of my simulations
it should have won, but it threw the game away
Urban Hafner
@ujh
Apr 15 2015 11:30
You pitted the two versions against each other, right?
Differences in self play are generally bigger than when playing against other bots. Or so I’ve heard.
iopq
@iopq
Apr 15 2015 11:32
I really want to try a few more algorithms too, like BRUEic or something like that and see if it makes a difference
Urban Hafner
@ujh
Apr 15 2015 11:33
I’m not familiar with that one. Is it just a different formula, or does it as different as RAVE is from UCT?
iopq
@iopq
Apr 15 2015 11:33
it does different exploration and exploitation policies
Urban Hafner
@ujh
Apr 15 2015 11:34
What I meant is, is it easy to implement? RAVE is a bit more difficult as you have to update different parts of the tree.
iopq
@iopq
Apr 15 2015 11:35
I wish I could tell you, but academic papers are so hard to read
I couldn't even get through their explanation of UCT itself
Urban Hafner
@ujh
Apr 15 2015 11:35
I know. :)
iopq
@iopq
Apr 15 2015 11:35
even though I know how you implemented UCT
because you used variable names longer than one greek character
Urban Hafner
@ujh
Apr 15 2015 11:36
That’s also why I haven’t tried to implement RAVE. And my UCT implementation is heavily based on those on senseis.
iopq
@iopq
Apr 15 2015 11:36
I stole my UCB-1 tuned from a python program
Urban Hafner
@ujh
Apr 15 2015 11:37
Yes, these things aren’t that difficult to implement when you see the code. But the CS way to describe them really makes it difficult.
Urban Hafner
@ujh
Apr 15 2015 11:40
To me the advantage is that it’s relatively easy to implement.
iopq
@iopq
Apr 15 2015 11:41
Well, but it doesn't really make the program that strong by itself
Urban Hafner
@ujh
Apr 15 2015 11:43
That is true. Compared to our AMAF it is quite a step up, but it’s still not THAT strong.
iopq
@iopq
Apr 15 2015 11:44
it's 1200 to 1400, I was actually disappointed
Urban Hafner
@ujh
Apr 15 2015 11:44
And as I’m playing around with reusing the subtree it really is obvious that it’s not that strong. It reused very little unfortunately.
iopq
@iopq
Apr 15 2015 11:44
what percentage does it reuse?
Urban Hafner
@ujh
Apr 15 2015 11:44
Assuming I’ve found all bugs. ;)
Generally single digit percentages.
iopq
@iopq
Apr 15 2015 11:45
Fuego gets double digits every time... maybe it just finds better moves
Urban Hafner
@ujh
Apr 15 2015 11:45
But I’m still not convinced that I’ve found all bugs.
iopq
@iopq
Apr 15 2015 11:46
So from reading literature there's a few things that are a problem with UCT
  1. You don't care about cumulative regret, you only care about its recommendation. It doesn't make you that happy that it explored the best move earlier as long as it found it
Urban Hafner
@ujh
Apr 15 2015 11:47
I need to play more games, but it seems to play worse that the version that doesn’t reuse the subtree. That suggests that there are still bugs remaining.
iopq
@iopq
Apr 15 2015 11:47
  1. You actually want to explore more earlier in the tree when you still have time budget and explore less deeper in the tree when you don't have that much budget remaining
  1. Everything is 1. because gitter
Urban Hafner
@ujh
Apr 15 2015 11:48
(you can switch the mode so that you enter multi line text. Then the counting works)
iopq
@iopq
Apr 15 2015 11:49
OK, another issue is that it doesn't consider the value of information. So when you backpropagate you still have all those wins you shouldn't have because you simulated wrong opponent responses
so later information is more valuable than earlier information but it's being treated the same
There are a lot of aspects you can improve just for general game playing, not even go-specific
iopq
@iopq
Apr 15 2015 12:36
Should I make another pull request for a Rust upgrade
because it's breaking the other pull request right now
Urban Hafner
@ujh
Apr 15 2015 12:37
No, just do it in your pull request. I’ll merge it right away anyway.