GTO in Rock Paper Scissors

GTO in Rock Paper Scissors

Let's assume two players play Rock Paper Scissors for 72 rounds. If I'm not mistaken, the GTO way of playing it would be to play each option 1/3 of the time at random. If both players were to do that, this would be the outcome:


If, however, Player 2 were to choose only Rock 100% of the time and Player 1 sticked to his GTO strategy, this would be the outcome:


I am a little confused, as both players break even in each of the scenarios. I would've expected Player 1 to win more and player 2 to lose more in the latter scenario. Can someone point me to where I made a mistake?

04 August 2023 at 04:09 PM
Reply...

3 Replies


Earlier posts are available on our legacy forum HERE

Thank you for your answer 3bet-Bravo. It made me curious as what you define as a pure mistake?


This example points out why “Nash Equilibrium” is a much better terminology than “game theory optimal”. GTO is NOT optimal, at least not against real players using real non-GTO strategies. The RPS example here demonstrates this. The optimal strategy in the event that player 2 only plays rock would be obviously to only play paper. The 1/3 randomization strategy is certainly not optimal in this situation.

The reason we learn GTO and strive to play it, though, is that our opponentsÂ’ strategies are not always obvious and they can change strategies if they think we are deviating from the NE strategy. As an analogy with the RPS example, how do we know villain is actually playing rock all the time? Did we see him play rock three times in a row? Five times? Either of these could result from a random strategy. They could be an intentional attempt to mislead us into deviating from the NE strategy so he can exploit us. We could start throwing paper too much and see a whole pot of scissors coming back, costing us a good bit of value.

NE strategy for RPS is obvious and can be easily calculated mathematically. But it also could be derived in a different way. Suppose a player always does play rock. We exploit by playing paper. He is not totally dumb though and sees what we are doing, so he adjusts by playing scissors sometimes. We counteradjust and add rock to counter his scissors. Back and forth we go, he adjusts to our new strategy and ww counter those adjustments. It seems like this could go on indefinitely, but that is not the case. Eventually we both will converge to the 1/3 randomization strategy and no further adjustments will give better results than that. This is the “equilibrium” referred to in the term Nash Equilibrium, and the Sam e logic applies to poker.


RPS becomes more similar to poker if you add a 4th option that beats rock, and loses to everything else.

Now the GTO strategy can actually profit from some (but not all) mistakes.

It still breaks even against any combination of rock/paper/scissors, but every time our opponent chooses the 4th option, we automatically gain EV.

Reply...