Cubeless vs cubeful rollouts for cube decisions
As I understand matters, most people use cubeful rollouts when trying to assess cube actions. That is, when simulating the outcomes generated by particular positions, they allow the bots to make doubling decisions. The obvious argument for this approach is that we are trying to estimate what would happen if one were to play out a position, and this could involve cube actions.
Despite this argument, Kit Woolsey opts for cubeless rollouts in his 'Backgammon Encylopedia'. One reason, which is much less relevant today, is that cubeless rollouts are faster. A more important consideration is that the bots must make cube actions during the rollout based on a quick evaluation (e.g. 3 ply) and that these cube actions may be far from optimal. This consideration becomes especially important in situations when we don't trust 3 ply, which is exactly when we want a rollout in the first place!
Is there any evidence on whether cubeless or cubeful rollouts are more reliable? Moreover, is it even possible to obtain such evidence? One might think that any 'evidence' would itself be based on rollouts, which leads to circularity.
Update: I have just seen an article by Douglas Zare that claims that "There seems to be no accurate way to roll out a decision of whether to double or not". Is this pessimism really justified? Is so, can we trust the bot's cube actions at all?
2 Replies
XG uses XG Roller+ for its cube actions in its standard "World Class" rollout settings.
If you do a cubeless rollout and then want the right answer (cubeful), the program has to adjust the cubeless results to account for the cube, and I think this is done in a kludgy manner, at least in part, though I don't know the details. So if you do a cubeful rollout, it takes out some of that kludgy-ness. For that reason, I'd tend to trust a cubeful rollout more. It does seem like there isn't a perfect way to get the right answer. But you can use stronger and stronger rollouts to see which quicker rollout methods they tend to line up with more. I do see how this is a bit circular, but it also seems pretty reasonable to trust rollouts with higher settings more than weaker rollouts, even if you don't trust the stronger rollout 100%.
In general, I agree with you that rollouts are questionable since the computer might be making systematic mistakes during the rollout. It can happen with checker plays too. Let's say one checker play by Black will cause big mistakes by White during the rollout and the other will allow White to play correctly. The rollout is going to favor the first play if the play is only a little wrong in actuality. I think the position I posted in "Guess the computer play" thread might be an example of this. (Haven't posted the rollout yet though.)
One idea is that the errors often "cancel out" -- for two different plays or cube actions being rolled out, the computer will make roughly the same amount of errors for both plays, so in that sense, you end up with something close to the right result. But yeah, that's not always the case. So it's not perfect, but it's the best tool we got, it's certainly more reliable than an evaluation which has even more issues.
@Z: OK, thanks for the thoughts! It is a shame that this all can't be settled more scientifically (though perhaps this leaves some mystery in the game...)