The GTO Blackpill
I've been working on an experiment recently that I think will fundamentally change the way you look at GTO.
A while back, I posed this question to the community: "Does Playing a Less Exploitable Strategy Give You an Edge?" The just of it was, if two players are both making unbiased mixing mistakes, will the less exploitable one have the edge, on average?
Tbh I wasn't certain what the correct answer was, so I set out to find it. In doing so, I discovered something that reframed the way I see poker.
The "Bad Reg" Strategy
Let's define a "Bad Reg" as a player who perfectly finds every pure move but just randomizes every mixed decision. They are the ultimate button-clicker.
- Pure action - GTO always takes one action (e.g. always calls this hand)
- Mixed actions - GTO mixes between actions (e.g. call 30%, fold 70%)
The Baseline: GTO vs. GTO
First, we need a baseline. We'll use a standard CO vs. BB single-raised pot, 100bb cash. Here are the baseline GTO EVs:


Experiment 1: Rounded OOP vs GTO IP (Known)
Now the fun begins. Let's round OOP's GTO strategy to the nearest 1/2 frequency. So for every decision point in their strategy, from flop to river:
- Low frequency actions become 0%
- Mid-frequency actions become 50%
- High frequency actions become 100%
This is the definition of the "Bad Reg" strategy. They are playing randomly when mixed, but the pure actions stay the same.
Here's an example of what that looks like:

Important note: Do not re-solve after rounding. The point is to measure the original fixed equilibrium strategy vs this newly rounded strategy. After rounding, I simply click "calculate results" to see the EVs of this new matchup:

Results: The EVs are identical (technically the EVs shifted by 0.004 bb, but that's just solver noise; goes away if you solve baseline to higher accuracy).

This was not a surprise to me, in fact it's an expected result from the laws of indifference, something I've written about before. In short, Nash implies that every mixed action should have the same EV, otherwise it's not at equilibrium. Since rounding only changed the frequencies of indifferent decisions, OOP doesn't lose any EV. This is the classic "mixing mistake" (exploitable, but 0 EV loss vs. GTO) vs. a "pure mistake" (playing an action GTO never takes, which does lose EV vs GTO).
In other words, the "bad reg" loses nothing vs GTO.
Up to this point, nothing is new. Pros and game theorists have known this for ages.
Experiment 2: Rounded OOP vs Rounded IP (Novel)
Okay, so what happens if we round both player's strategies? This is where it gets weird.
Now we have two rounded strategies facing each other. Crucially, the mixed actions are no longer indifferent. Facing GTO, some hand had the same EV regardless if it bet or checked. Facing bad reg, one of those actions has higher EV. The indifference argument no longer applies.
Both players are making a ton of mistakes, so one player should "accidentally" exploit the other just by chance, right?
Nope.

Result: The EV remains exactly the same as the GTO baseline.

I've run this experiment on a dozen different spots, in different formations and SPRs, varying levels of complexity, and the EVs stay locked to the GTO baseline.
This is very suspicious to me. Like, I would have expected mixing errors to somewhat cancel out just due to luck, but not this precisely. The mixing errors cancel out to like ~1/1000th of a big blind. Smaller than I can reliably measure. And the further down I solve my baseline, the smaller the error becomes. What in the F***? Why? How? How are two semi-random strategies just by chance arriving at the exact GTO vs GTO EVs? It is completely counterintuitive to me that this should be the case.
Analysis
We already know from Experiment 1 that a GTO player doesn't beat the Bad Reg:
- [GTO vs Bad Reg] EV = [GTO vs GTO] EV
The real blackpill is what Experiment 2 showed:
- [Bad Reg vs Bad Reg] EV = [GTO vs GTO] EV
That's the truly novel part. Unbiased mixing mistakes precisely cancel out in the aggregate.
So... Are Mixtures Useless? (Caveats)
No, but it shows that GTO mixing is 100% defensive.
The only reason to mix is to reduce exploitability. Our "Bad Reg" strategy is wildly exploitable (~15% of the pot in this case). An exploitative opponent will destroy this strategy. But even then, there are ways to reduce exploitability that don't require mixing.
Caveat 1: In reality, most opponents don't make unbiased mixing errors. They lean some direction (e.g. over-folding, over-calling, over-aggressing, etc). When mixing mistakes are biased, they presumably won't cancel out in the aggregate.
Caveat 2 (The Counter-Example): This "error canceling" effect isn't a universal law. I built a simple polarized river toy game (1 value, 3 bluffs, 1/2 pot shove behind), and rounding both strategies does change the EV. In a simple, asymmetrical toy game, the errors don't cancel out. But in the full game with millions of decision points, the they seem to wash out perfectly.
The Blackpill GTO Study Guide (How to Actually Use This)
I've been calling this player a "Bad Reg," but this is actually a very high-level strategy lol. This is someone who loses no EV against a GTO trainer. This is a player who:
- 1. Finds every single pure action correctly.
- 2. Knows exactly what actions to mix between.
So, your first goal should be to become the Bad Reg. If you can get your baseline to that level, you're already playing at a very high level. Then you can focus on exploitation.
Level 1: Master the Pures. "This hand always bets here", "This hand always folds here". These pure actions are the load-bearing walls of your strategy. When you study solvers, focus on finding the pure actions first.
Level 2: Master the Classification. Forget the frequencies for now. Just learn to identify: "Okay, this hand is a mix between calling and folding" or "This one is a mix between checking and betting small". Everything else is noise.
Level 3: Exploit. Once your baseline is prepped, you can start deviating to exploit your opponents. Lean your mixtures in certain directions to exploit the pool and player-specific tendencies.
2+2 is dying but I'll keep posting some interesting threads here, every now and again.
If you want to see what others are saying about this, I also posted it in r/Poker_Theory which has a more active community.
2+2 is dying but I'll keep posting some interesting threads here, every now and again.
If you want to see what others are saying about this, I also posted it in r/Poker_Theory which has a more active community.
I hope 2+2 or some kind of forum format continues and you keep posting always tombos. It does seem like your talents are wasted more often than not. But personally i find it a struggle with reddit/discord for anything approaching satisfying strategy debate. Too anon and transient but maybe i'm old now.
Genuine Q: why doesn't GTOw have its own strategy forums? That would be the perfect scenario imo. Folk get a format that's palatable, readable and more human. The site can mop up customers etc.
So, your first goal should be to become the Bad Reg.
Challenge accepted 🫡
That's a cool idea Ceres, I'll suggest it to them.
This bad reg would beat 5k+ 😃
Hypothesis why it cancels out
If your value hands lose some EV, bluffs gain ev and vise versa. Otr esp IP value hands are pure bets and only bluffs mix, so in spots where IP have a lot low frequency bluffs and oop has a bunch of low frequency calls. IP will under bluff and oop will over fold, but opposite scenario is also possible. Given how many possible rivers there, you can expect this to cencels out, esp because they don't exploit each other to the max.
For turn and flop logic is similar, the difference here is now there is almost no pure bets, so it's even less likely there will be accidental exploit on a given street.
I'm surprised bad reg loses 15% of the pot vs max exploit.
This bad reg would beat 5k+ 😃Hypothesis why it cancels outIf your value hands lose some EV, bluffs gain ev and vise versa. Otr esp IP value hands are pure bets and only bluffs mix, so in spots where IP have a lot low frequency bluffs and oop has a bunch of low frequency calls. IP will under bluff and oop will over fold, but opposite scenario is also possible. Given how many po
I expected there to be some cancellation as the mixing mistakes are unbiased. But I figured one player should, just by chance, end up exploiting the other player a bit more, and thus the EVs would shift. But it cancels out very precisely. Like we're talking milli-bb levels of precision here. My flabbers are gasted.
Btw it's the same result if we make the rounding even coarser, 0% or 100% rounding on all streets:
- OOP EV goes from 1.8395 to 1.8343 (-0.0052 bb)
- IP EV goes from 3.2005 to 3.2057 (+0.0052 bb)
(The original GTO strategy has an exploitability of 0.005bb, so these minute shifts are within the margin of error)

Despite both players being exploitable for half the pot, the EVs of this strategy matchup precisely match the original GTO vs GTO EVs.
have you tried badreg#1 vs badreg#2,
badreg#1 as above,
badreg#2 mixes radically oneway: any mixed action becomes 100% fixed, or some 80% / 20% split.
after changing OOP vs IP to badreg vs badreg, you've posted overall strat vs strat EVs, which stay constant.
can you show EVs of specific hands before and after locking new mixing splits?
edit: just saw your post above. can you keep badreg#1 fixed and only change badreg#2?
(quote) Facing GTO, some hand had the same EV regardless if it bet or checked. Facing bad reg, one of those actions has higher EV. The indifference argument no longer applies.
if you can show that EVs of specific hands stay the same, you've shown experimentally that indifference still applies, and that to exploit badreg#x one needs adjustments beyond mixing frequencies.
Hi zz666z,
I've done a version of this experiment here where one player gets coarse rounding and is more exploitable. The more exploitable player doesn't lose any EV overall.

However, the EV of individual hand actions changes significantly. There is no reason it should be indifferent, because villain's strategy is no longer balanced. I checked using the compare EV function over a few nodes and couldn't find even one example of a hand having two actions worth the same EV. I guess there probably are some just by chance.
That's what makes this an interesting experiment though. All those mistakes just seem to cancel out and we're left with the original GTO vs GTO EV.
oh that means that if we change only our mixing splits for a few hands vs badreg#x' strat, we might happen to exploit him?
well I guess most likely something like that will be happening:
say in one specific configuration certain hands gain EV with new mixing splits vs badreg's adapted strat. can you classify them somehow in your strat into a certain category, where solver is mixing similarly, and compare with the rest of the portion where solver also mixes? it gotta balance out somehow after all, so sort of some value vs bluff balancing or so.
what you mean with unbiased mixing? what does biased mixing mean exactly, and why would it lead to different results? I see no difference between biased and unbiased mixing, biased being just an edge case.
oh that means that if we change only our mixing splits for a few hands vs badreg#x' strat, we might happen to exploit him
Yes, it creates hugely exploitable deviations, as shown in the exploitability% part.
what you mean with unbiased mixing what does biased mixing mean exactly, and why would it lead to different results I see no difference between biased and unbiased mixing, biased being just an edge case.
Unbiased mixing has no preference for certain actions. So it's not going to universally overfold for example. The rounding procedure is unbiased.
The reason unbiased is important is because it prevents players from systematically exploiting each other.
An example of a biased mixing mistake would be a player that leans towards folding whenever they are indifferent between folding and continuing. That creates a style that exploits some player types and gets exploited by others.
say in one specific configuration certain hands gain EV with new mixing splits vs badreg's adapted strat. can you classify them somehow in your strat into a certain category, where solver is mixing similarly, and compare with the rest of the portion where solver also mixes it gotta balance out somehow after all, so sort of some value vs bluff balancing or so.
Hard to parse this sentence, I'm not sure what you mean?
I can set one player to the maximally exploitative strategy to destroy the rounded player.
I did not intend to be too specific with that, just expanding on the previous thought about where the balancing might occur. we see EVs balancing out, but we don't know yet how.
so I'd try to classify portions of ranges in different ways and see if patterns occur, where EV changes sum to zero.
that's what was suggested before, with value and bluff portions of ranges. Ive no opportunity to see it for myself, so Im bound to guess from afar.
so about biased and unbiased mixing, not sure why it's important here to note that we're not mixing biased, or don't want to mix biased, in the framework of our experiment. it's strategically different for sure, when it comes to exploitability in practice.
but in our theoretical considerations, you're not saying that biased mixing can be exploited by badreg#x, right? assuming same rounding procedure for whole range, not just few hands. same balancing of EVs will occur, and we'll end up with same overall EVs as before, for badreg#biased vs badreg#unbiased, if I interpret everything correctly.
punctuation marks don't transfer from new 2+2 to old 2+2
I expected there to be some cancellation as the mixing mistakes are unbiased. But I figured one player should, just by chance, end up exploiting the other player a bit more, and thus the EVs would shift. But it cancels out very precisely. Like we're talking milli-bb levels of precision here. My flabbers are gasted.
If OOP by chance expolits IP in 55% of nodes for 1% of the pot and IP expolits OOP for the same amount in remaining 45%. You'll end up with OOP accidentally exploiting for only 0.1%. Probably as the number of nods goes up it cancels out more and more. That my intuition at least
as of now, haizemberg might be onto something. checking hard edge cases will clarify more.
so about biased and unbiased mixing, not sure why it's important here to note that we're not mixing biased, or don't want to mix biased, in the framework of our experiment. it's strategically different for sure, when it comes to exploitability in practice.
To make it as simple as possible:
- Unbiased mixing: No preference for specific actions, mistakes cancel out, player EVs = GTO
- Biased mixing: One player tends to exploit the other, mistakes do not perfectly cancel out, player EVs ≠ GTO
For example, if one player has a bias towards folding (nit), and the other has a bias towards betting (maniac), then the maniac will exploit the nit.
Ive no opportunity to see it for myself, so Im bound to guess from afar, not being into minutiae of solvers or game theory as of now.
If you want to try it yourself, download the free version of PioSolver. Solve the default tree. Then press Ctrl+D to apply rounding. Do not re-solve. Click "Calculate Results" to measure the new range vs range EVs. You can also explore the strategy and see how certain hands are exploiting each other at different nodes.
If OOP by chance expolits IP in 55% of nodes for 1% of the pot and IP expolits OOP for the same amount in remaining 45%. You'll end up with OOP accidentally exploiting for only 0.1%. Probably as the number of nods goes up it cancels out more and more. That my intuition at least
Yeah it definitely feels like a law of large numbers effect
I'm proposing the "mistake cancellation" effect occurs when three conditions are met:
- Condition 1) Players are not making pure mistakes.
- Condition 2) Mixing mistakes are unbiased (no preference for certain actions).
- Condition 3) The game tree is big (millions of decision points).
It does not matter if one player is more exploitable than the other. If these three conditions are met, I believe the EVs will stay the same as they were in the original GTO simulation.
yeah I see.
so how come we have condition 3) now? how does the law of large numbers come into play?
imo it's still most likely structural if cancelling occurs, and there's probably a simple proof, based on general characteristica of gto strategies in our game, and the specific unbiased rounding procedure.
if the EV changes for specific hands through unbiased mixing deviations aren't balanced in some way, then it doesn't matter how many decision points or nodes we have. they just wont sum to zero. that's exactly what happens with biased mixing deviations, for example.
This is very suspicious to me. Like, I would have expected mixing errors to somewhat cancel out just due to luck, but not this precisely. The mixing errors cancel out to like ~1/1000th of a big blind. Smaller than I can reliably measure. And the further down I solve my baseline, the smaller the error becomes. What in the F***? Why? How?
I added condition 3 because this error cancelling effect isn't true in all cases. A simple counterexample is given in the OP:
Caveat 2 (The Counter-Example): This "error canceling" effect isn't a universal law. I built a simple polarized river toy game (1 value, 3 bluffs, 1/2 pot shove behind), and rounding both strategies does change the EV. In a simple, asymmetrical toy game, the errors don't cancel out. But in the full game with millions of decision points, the they seem to wash out perfectly.
In this case the rounding procedure creates a bias: the aggressor stops bluffing and the defender over-folds, which means the defender is exploiting the aggressor.
But I think it's like Haizemberg said, when you have a lot of nodes all of these "accidental exploits" cancel out.
I wouldn’t have predicted that result either but tbh don’t see why it would be a breakthrough. Since there are no players who play perfect equilibrium or this unbiased mixing strategy your own frequencies do matter
@tombos
you said it's a biased rounding procedure in your toy game, that's a difference. cancelling may be simply linked to unbiased mixing, or unbiased rounding. dunno if the toy game represents all properties of gto in our game besides that accordingly, or if the complexity is too different, beyond the smaller number of nodes.
@charles
even if someone plays the unbiased mixing strategy, he's exploitable, so your own frequencies do matter, if you want to exploit him. if you only want to stay unexploitable vs him, you may play gto, or any unbiased rounding strategy.
but your own frequencies will still matter, since if you mix biased vs someone who mixes unbiased, you will be probably exploited.
@tombos
a lot of "accidental exploits" don't just lead to cancelling, or approach cancelling.
for example to compare,
limit[sum[(-1/2)^k, {k, 1, n}], n -> infinity] = -1/3 != 0.
Does it converge to exactly gto vs gto ev with higher base accuracy (well, approximately), or is it just the case of relatively minor changes in strategy yielding relatively minor ev changes against an unaware opponent and then averaging out over a large amount of nodes? Have you looked at any aggregate reports? Sorry if this is a stupid question.
I have solved the default tree in pio to a higher accuracy (0.005%):

First results after "required accuracy reached" is just me clicking "calculate results" changing nothing. Others are respectively 1/1000, 1/100, 1/10, 1/5, 1/3, 1/2, 1/1 rounding. Looks like the rounding correlates with the EV changes. Is that not a big enough tree?
bigfishinsmallpond I think the issue is you are rounding, then re-rounding the already rounded strategies, which creates biases.
Try saving (full save to river) the original GTO strategy, then loading that up each time before you apply rounding.
GTO

1/2 Rounding

1/1 Rounding

Results:
The nash distance of the original solution is accurate to +/- 0.008 bb, so I consider these results to be equal within the margin of error.

I would also point out that this is a rather small tree. I suspect bigger trees tend to have more error cancelling.
I wouldn’t have predicted that result either but tbh don’t see why it would be a breakthrough. Since there are no players who play perfect equilibrium or this unbiased mixing strategy your own frequencies do matter
I mean you could apply the same logic to Nash strategies
"Why should GTO matter if no one plays it in practice?"
I mean you could apply the same logic to Nash strategies
"Why should GTO matter if no one plays it in practice?"
I think there's a big difference. If you can play NE you minimise the amount your strategy can be exploited by, which is why it's how a lot of high stakes regs try to play. But if you replicate this approach in this thread that only has merit against an imaginary opponent who doesn't exist and I'd be confident a bot doing it above microstakes would lose

