Is a Nash strategy relevant outside of the specific context it was solved for?

Is a Nash strategy relevant outside of the specific context it was solved for?

I'd like to think I have a decent understanding of poker theory, but am trying to tie up a few loose ends so I'm going to outline my current understanding and a situation I've been thinking about, in the hopes someone can either confirm or help me understand what I'm missing, apologies for length of the post, I tried to make it as short as possible while still being clear.

The promise of playing Nash equillibrium strat is that you guarantee a minimum amount of EV vs another opponent playing Nash, and any deviations they make from Nash, you passively capture more EV with no changes to your strat. If the opponent deviates from Nash, there will be a higher EV strategy you can play relative to Nash, but it will have greater exploitability and lower EV than Nash if the opponent is able to adjust to your adjustment.

Now here's my confusion, Nash strategies are calculated based on parameters you set, namely the preflop ranges, bet sizings, etc. you give it for each player. So this Nash strategy is only relative to the ranges it was solved with. If you were to memorize a Nash v Nash strategy postflop, but then change the opponents preflop range, you could make it so that our EV drops below that initial promised minimum EV of the original Nash strategy.

So in an extreme example, if we took a spot like SB vs BTN 3BP and made it so the opponent only calls AA,folds everything else, our postflop EV is gonna plummet overall, however if we continued to play the old Nash v Nash strategy, our EV would be much lower compared to if we re-solved for "new Nash" based on the updated opponents preflop range (which would likely end up x'ing range on most boards lets just say).

Now obviously we will still gain EV from an opponent playing this type of a strategy, but it would be gained from preflop when they get 3b and fold almost range, not postflop where we get crushed. So this begs the question, if the strategy solved for Nash v Nash can have a lower EV than a strategy solved vs a non Nash preflop range (but is still Nash v Nash postflop), why are we ever attempting to play the original Nash strategy vs a player who is not playing the exact preflop range we assigned to generate the original Nash strategy, when it is both lower EV AND exploitable (since it's not the "new Nash", it would be exploitable by definition of being a deviation from the Nash strat solved for the actual ranges) ? If the Nash v Nash strategy doesn't even deliver on its promise of unexploitability anymore, then doesn't it make more sense to try to approximate the "new Nash" instead? I would have to assume that its because depending on how the opponent's range differs, the original Nash strat still ends up performing relatively well, but with how fragile equillibrium is I could still see there being significant shifts in the optimal strategy.

) 2 Views 2
17 December 2024 at 11:24 PM
Reply...

9 Replies



The idea that “your postflop EV will plummet” is misleading. If SB plays a fixed Nash equilibrium strategy, it makes the same EV against BTN’s AA regardless of whether AA is the only hand BTN calls with. The difference is that SB collects more preflop EV because BTN folds everything else.

Second, when you adjust for non-standard ranges, you’re not solving a “new Nash”—you’re solving an exploitative strategy based on assumptions. If those assumptions are wrong, your strategy might backfire, (like any exploit).

I break this down in more detail here: Timestamp 4:10.


by tombos21 k

The idea that “your postflop EV will plummet” is misleading. If SB plays a fixed Nash equilibrium strategy, it makes the same EV against BTN’s AA regardless of whether AA is the only hand BTN calls with. The difference is that SB collects more preflop EV because BTN folds everything else.

Second, when you adjust for non-standard ranges, you’re not solving a “new Nash”—you’re solving an exploitative strategy based on assumptions. If those assumptions are wrong, your strategy might backfire, (like

Thanks for the reply and taking the time to help, I did watch through the video again but hoping you can clarify a few things for me, in the "they only call AA" scenario I made a typo and was supposed to be a HU SB vs BB 3bp (not sure if this changes the conclusions at all) , I actually did this in GTO Wiz AI and picked a random flop, and compared to Nash the BB's EV showing on the first flop node was WAY lower. So when you say SB makes the same EV against BB's AA, are you saying that although our EV can be much lower on a given board due to BB range (AA) having overall higher equity than it would have otherwise, we still make it back across the rest of the game tree (on the boards where BB used to have nuts but doesn't anymore, the times we make nuts and get paid more often etc.) OR that I've missed a step here and actually that node SHOULD be displaying the same EV as it did on that first node as when BB was playing an optimal preflop range?

For the second, so even though if I were to change one player's preflop ranges and the solver is still going to run its iterative process for both players maximizing EV until it can't anymore, the resulting strategy is not going to be considered Nash? And I'm not talking about a postflop nodelock here, strictly just a difference in the preflop ranges input (maybe the distinction is irrelevant). This seems like the conclusion would be that if we ever run a solve where one player has a preflop range that differs from optimal, our highest EV strategy the solver spits out would still be exploitable (even under the assumption the opponent was in fact playing the range we gave them), is that right?


Thanks for the reply and taking the time to help, I did watch through the video again but hoping you can clarify a few things for me, in the "they only call AA" scenario I made a typo and was supposed to be a HU SB vs BB 3bp (not sure if this changes the conclusions at all) , I actually did this in GTO Wiz AI and picked a random flop, and compared to Nash the BB's EV showing on the first flop node was WAY lower. So when you say SB makes the same EV against BB's AA, are you saying that although our EV can be much lower on a given board due to BB range (AA) having overall higher equity than it would have otherwise, we still make it back across the rest of the game tree (on the boards where BB used to have nuts but doesn't anymore, the times we make nuts and get paid more often etc.) OR that I've missed a step here and actually that node SHOULD be displaying the same EV as it did on that first node as when BB was playing an optimal preflop range?

This happens because the EV displayed reflects BB’s entire defending range, not just the times they hold AA. Let’s walk through an example.

BB's EV against SB's entire range is 10.81bb.
When SB specifically holds AA, their EV is 33.37bb, which means BB’s EV is (20bb pot - 33.37bb) = -13.37bb.


In other words, when SB's AA is disguised, BB loses 13.37bb per hand on average.

---

In this case, BB behaves as if SB's hand is face-up.


  • SB's EV with AA drops from 33.37bb to 15.62bb.
  • BB’s EV increases from -13.37bb to (20bb pot - 15.62bb) = +4.38bb per hand.

So, BB's EV improves because SB's AA is no longer disguised. When BB can adjust, they exploit this face-up strategy.

If BB instead continued playing GTO, SB’s AA would earn the same EV (33.37bb). This aligns with Nash Equilibrium: no player can unilaterally adjust to improve their payoff. Even if AA is the only hand that calls, it can't magically earn extra EV against GTO play.

---

For the second, so even though if I were to change one player's preflop ranges and the solver is still going to run its iterative process for both players maximizing EV until it can't anymore, the resulting strategy is not going to be considered Nash? And I'm not talking about a postflop nodelock here, strictly just a difference in the preflop ranges input (maybe the distinction is irrelevant). This seems like the conclusion would be that if we ever run a solve where one player has a preflop range that differs from optimal, our highest EV strategy the solver spits out would still be exploitable (even under the assumption the opponent was in fact playing the range we gave them), is that right?

Yes, exactly. When you input suboptimal ranges, you’re solving for an exploitative strategy constrained by those assumptions. If those assumptions are wrong, your strategy is vulnerable to counter-exploitation, like any exploit.

For example, imagine solving a game where OOP’s range is “2+2.” Sure, the solver might produce a Nash solution for this toy game scenario, but it’s far from unexploitable in reality.


To achieve a true GTO solution, you must input optimal starting conditions. Exploiting opponents’ tendencies can be powerful, but it’s important to recognize these solves as exploitative simulations, which carry the inherent risks of any exploit.


Well as I was writing up a long response Tombos responded with a better explanation than I was giving. I will still post what I had written in case my somewhat different explanation is helpful:

I think what he means is that if you play a solver equilibrium strategy, your EV specifically against AA will be the same whether it's one small subset of his range or whether that's the only hand in his range.

The blinds are what incentivises us to play more hands than just AA. So if we play a solver strategy and our opponent only plays AA, we make more money all the times our opponent fails to defend and we pick up dead money. We will make more than if they played a solver strategy and defended properly.

Sure we will lose money when they have AA, but we will always lose money when they have AA whether it's their entire range or one small subset of it.

That's not to say we shouldn't alter our strategy to exploit our opponent's tendencies if we know they only play AA. But if we don't adjust we will lose the same amount against AA as we always do when we run into AA, but we will make more all the times they over fold by passively exploiting their mistakes (we play GTO, they make mistakes and we gain EV when from their mistakes).


I think what he means is that if you play a solver equilibrium strategy, your EV specifically against AA will be the same whether it's one small subset of his range or whether that's the only hand in his range.

Spot on


by tombos21 k

This looks just like another poker variant created by me.


by GreatWhiteFish k

I think what he means is that if you play a solver equilibrium strategy, your EV specifically against AA will be the same whether it's one small subset of his range or whether that's the only hand in his range.

The blinds are what incentivises us to play more hands than just AA. So if we play a solver strategy and our opponent only plays AA, we make more money all the times our opponent fails to defend and we pick up dead money. We will make more than if they played a solver strategy and defended

I have no confusion in regards to this part and completely agree

"For example, imagine solving a game where OOP’s range is “2+2.” Sure, the solver might produce a Nash solution for this toy game scenario, but it’s far from unexploitable in reality."

The strategy is exploitable even if the players truly were locked to these preflop ranges? This is really the bulk of what I'm trying to understand I guess, that's confusing to me because my understanding is that solver would iterate until neither player can gain any more EV, but if the strategy is always gonna be exploitable wouldn't it never solve and instead just end up in a RPS type of situation where one player changes strat to gain EV, other player changes to gain EV, repeat forever?


To be clear, it will create an unexploitable strategy for the toy game universe where those starting ranges (and other parameters) are locked in.


by tombos21 k

To be clear, it will create an unexploitable strategy for the toy game universe where those starting ranges (and other parameters) are locked in.

Understood, thank you for the help

Reply...