|
BGonline.org Forums
Self-correcting rollout algorithm?
Posted By: Timothy Chow
Date: Thursday, 10 September 2009, at 7:49 p.m.
If one could get away with it, rolling positions out at 0-ply would seem to be the way to go because it's fast. With the advent of parallel computers, cloud computing, etc., tens of millions of 0-ply trials is not unreasonable to think about.
The catch, of course, is that 0-ply makes mistakes. The standard response to this problem seems to be to pick some higher ply. However, it strikes me that this approach is unsatisfactory for a couple of reasons. There are some positions for which (say) 2-ply analysis is overkill, either because they are "easy" or because the result of the decision doesn't make a significant difference to the overall assessment of the initial position. On the flip side, there are some positions for which 2-ply is inadequate. Positions where the computer has repeated opportunities to make a play (e.g., running the back checker or breaking its board or redoubling prematurely) which might not be correct pose a particular challenge. Mike Corbett's fascinating book Backgammon Problems has several examples of positions where Snowie can figure out that something is wrong if a long rollout is performed. However, because the bot is still playing incorrectly during the course of the rollout, increasing the number of trials does not cause us to converge to the truth.
Therefore it seems desirable to design a rollout algorithm that has better potential for self-correction of systematic errors. The basic idea is to allot computer time adaptively to positions that come up during the rollout. That is, we make the computer think harder about certain positions than others. In critical cases, we make the computer do a mini-rollout of a position that comes up to decide its checker play, instead of being locked into whatever its n-ply analysis says. The point is that if we are willing to do more trials, the trials will be used to increase the confidence that the bot is playing the rolled-out games correctly, instead of just increasing our confidence that we are getting an accurate estimate of the bot's (possibly flawed) opinion of the position.
The critical obstacle in implementing such a strategy is, how do we decide which positions merit more "thinking time"? Some criteria seem obvious, e.g., positions that arise on the very next turn are generally more critical than positions that will not arise until many ply into the future, simply because the latter positions have only a very small probability of coming up. However, other criteria are less obvious.
This question seems related to the question posed recently on this forum, of measuring the "complexity" of a position. One's first thought might be that if the best checker play is miles ahead of the next checker play on a 0-ply analysis, then we should make a judicious decision not to spend so much time analyzing it further. This seems reasonable. But simply spending a lot of time on checker-play decisions where the top few plays have very similar 0-ply equity is not a great idea, because the reason for the similar equity may just be that it doesn't make a lot of difference which move you make. If you're in a long contact-free race and you roll 2-1, there's no sense spending a lot of time thinking about how to play it.
So here is a suggestion (possibly the only original thought in this message): a checker-play decision is critical if it satisfies all of the following criteria: (1) it comes up early in the search tree; (2) the top couple of plays are close in equity; and (3) the associated temperature maps of these close plays are not close to each other. Point (3) here just means that the plays cause many of the possible next rolls for the opponent to play out very differently. This seems to be a reasonably good indication that the decision is an important one and merits further investigation.
I'm not sure yet how to decide whether a cube decision is critical. I don't think temperature maps are quite the right tool, but this point requires further thought.
Anyway, supposing we have some decent way of assigning weights to different positions in the search tree, to indicate how much time we should spend on them, we can then do some kind of adaptive sampling algorithm to make the best use of the ten million trials (or whatever number of trials) we have time to do.
It seems likely that one could incorporate variance reduction into the picture, too, though this also requires further thought.
|
BGonline.org Forums is maintained by Stick with WebBBS 5.12.