Good sample size for stats

Good sample size for stats

Good day everyone, I’m interested in the answer to the question: what is good sample size for poker stat in NLH (in Hud or PopUp)? I'm interested especially in HU NL cash, but probably in 6-max cash or tournaments number will be the same. By "good sample size" i mean that sample sizes with low average deviation, not more than 5% (of course lower will be better), so using that sample size i can say that that if specific stat on player is deviating more than on 5%, i can be sure that this player is deviating in this spot. For example, good cbet % in SRP HU will be 65%, so if i have necessary number of samples on player and i see his stat "Flop Cb in SRP" not between 60-70% i can guess that probably this player is under/over Cbets flop.
My initial thoughts was that 100 sample sizes is good, but that's just only my own feelings. I was trying to find info on that question in the internet and i find some article (whose content make a lot of sense for me) in which author was writing that 30+ sample sizes will be perfect, but just for looking for extreme deviations. For example if Flop CB stat was very high (28/30=93%) or very low (4/30=13%), that`s probably already say that opponent is deviating in that spot (perfect is 20/30=65%), but if it's stat was around 50% (15/30) or 75% (23/30) that's probably don't say anything, because of deviation using that sample. So using this info we can make a conclusion that the more sample size we use then the less will be the deviation %, as in these example with 30 sample sizes deviation % can be +-15% and for sample sizes that we're finding, let's for example it will be 100 samples, deviation will be +-5%. But these numbers of deviations are just examples and not saying that actually especially for 30 samples deviations will be +-15% and for 100 5%.
So how can we find good sample size with dev. 5% or lower ? Maybe there is some math formula that can calculate this? I was trying to find these but all that i find it's calculator which calculates sample size for people who completed the survey using population size, confidence level % and margin error % and some articles about this. But probably that's unsuitable or i just don't understand how to use it in our case.
Maybe somebody already know the answer what is good sample size and/or how to calculate it, or just have thoughts about it? I would be grateful for any answers or thoughts, everything will be interesting to read!

05 January 2024 at 02:58 AM
Reply...

17 Replies



I think it depends on the stat and your own ability to deal with uncertainty and revise your assumptions later on if the stats change as your sample grows.

Some plays or frequencies are unusual enough that even with a small sample it's statistically likely people have leaks in their game. But that's a poker knowledge thing rather than something a stats whizz can derive from a formula.

In Pokertracker you can move your mouse over a stat and see what the sample size is. So sure, if someone Cbets 100% of the time but the sample size is 4, you know nothing. If it's 20, I would start taking it seriously.

I think where people go wrong is in thinking that because you may need a six figure sample size in terms of number of hands to know the difference between whether you're running good or a winning player, therefore if you only have a few dozen or a few hundreds hands on someone you can't draw any reasonable assumptions about other stats.

One open limp, or a VPIP/PFR way out of whack in the first 10 hands they play, is enough for me to mark them as a "potential fish" and I'm happy with that.


Yea, of course, I agree with your thoughts, this makes a lot of sense
Maybe someone else also has thoughts on this topic?


20 samples is where I personally start paying attention which for stats like VPIP and PFR is 20 hands.
10 samples is sufficient to determine which side of the spectrum a player resides.
Maniacs for example often don't need more than 10 hands to be identified.
But postflop statistics can be very difficult to get to converge even for player with 200+ hands.
Unless they are maniacs and their postflop stats either are 10- or 80+.

An interesting case is where a statistic is zero for 15+ samples and then he just does that.
Like a guy doesn't raise for 15 hands and his PFR is 0 and then he suddenly raises.
How do you estimate his raise frequency?
The obvious thing is 1/(15+1) and then I multiply that by 2/3.
The 2/3 is a caution/regret-factor I added in after a couple hands where I lost my stack but wouldn't have lost my stack my stack if I had used a tighter preflop estimate.
And also if you have a small sample size you should just be cautious because maybe the player was just very unlucky or lucky the first 10 hands.


You can use math to estimate the statistical confidence of HUD stats.

I'll run through some examples below. Try guessing before clicking the spoiler to see how well your intuition aligns with reality. A few hours of these exercises will make you much better at interpreting HUD stats, knowing when they matter, and when to brush them off as noise.

Example 1: VPIP%

Let's say someone has VPIPed 8/20 hands (40% VPIP). Is this a big enough sample to confidently say they're a whale? (6-max)

Spoiler
Show

It's close. There's a decent chance they're a whale, but they could still be a reg running hot. At this point, you should start deviating, but don't get too out of line yet.

The 95% confidence interval is between (19.1% - 63.9%):


Let's test a hypothesis. This player has VPIPed 8/20 hands. How often would a player with a true VPIP of 25% enter at least 8/20 pots just by random luck?


Well it's only about a 10% chance. In other words, you can be somewhat confident this player is entering way too many pots, but can't rule out a reg on a hot-streak just yet.

Example 2: Preflop 3-bet%

Let's say you're playing someone with a low preflop 3-bet% of 5% who's had 40 opportunities to 3-bet (note that I'm mentioning opportunities rather than hands played). Can we confidently claim this player is not 3-betting enough?

Spoiler
Show

I would argue no. They've only 3-bet 2/40 spots, but could just be card dead. These low percentage stats take much longer to converge.

The 95% confidence interval of this stat is (0.6% - 16.9%):


Let's test a hypothesis. What is the probability that a player whose true preflop 3-bet was 10%, would 3-bet no more than 2/40 spots?


About 22%. In other words, there's still about a 1/4 chance this is a card-dead reg.

Let's test the Maniac hypothesis. What are the chances that an aggro player with a true 3-bet% of 15% would only 3-bet at most 2/40 spots?


About 5%. So we can't even rule out an aggro 3-bettor yet.

In conclusion, you don't have a big enough sample size. You could maybe tighten up a sminch vs 3-bet, but IMO shouldn't label this player a nitty raiser and start making huge adjustments against them. Not yet, anyway.

Example 3: Flop c-bet%

Let's say you've played 800 hands against a reg, and in that time they've c-bet flop 80% of the time. Out of those 800 hands, they've had 100 opportunities to c-bet the flop. Is this sample size enough to say they're c-betting flop too wide?

Spoiler
Show

Yes. You can confidently claim that this player is c-betting flops too wide (assuming they aren't super tight preflop).

The 95% confidence interval is between (70.8% - 87.3%):


Let's test a hypothesis. OP claims that 65% is a good c-betting stat. What is the probability that a reg whose true flop c-bet% is 65% would have c-bet at least 80/100 flops?


0.078%. Basically 0%. We can confidently claim that a reg with 65% flop c-bet would not have c-bet this many flops. So their actual flop c-bet% must be higher. You can and should adjust your strategy facing their c-bet.

Interpreting Confidence Intervals:

Spoiler
Show

The thing to keep in mind is that some stats take way more hands to converge than others. The accuracy of some HUD stat has nothing to do with how many hands you play, and everything to do with how many opportunities they had to do the thing. Stats like VPIP converge very quickly, but something like river 3-bet% is worthless against almost everyone you play against.

There are different methods of estimating confidence intervals. Some converge faster, some do better around the super high or low stats, some offer more accuracy. I'm using a formula called the Clopper-Pearson Exact Method, which is said to be the most accurate. However, bayesian methods may be even better if you plug in population data as a prior.

The way you interpret a confidence interval goes like this:

1) We assume the stat to be true, (e.g. 3-bet% = 4%).
2) If we ran 100 trials, we'd measure this stat to be between (0.5% - 13.7%), 95% of the time.

Then when it comes to hypothesis testing:

1) We assume some stat to be true, (e.g. reg's true preflop 3-bet% is 10%)
2) Measure how often the hypothesis would exceed or fall short of the measured stat (4%) after x samples.

Here's the tool I use to estimate HUD stats: https://homepage.divms.uiowa.edu/~mbogna...


by Mind-Zei k

...
An interesting case is where a statistic is zero for 15+ samples and then he just does that.
Like a guy doesn't raise for 15 hands and his PFR is 0 and then he suddenly raises.
How do you estimate his raise frequency?...

The 95% confidence interval for a 0/15 HUD stat is between 0% and 21.8%.

However, it's skewed towards the lower end. For example, the probability of a 15% VPIP player folding 15/15 hands is less than 9%.


by tombos21 k

You can use math to estimate the statistical confidence of HUD stats.

I'll run through some examples below. Try guessing before clicking the spoiler to see how well your intuition aligns with reality. A few hours of these exercises will make you much better at interpreting HUD stats, knowing when they matter, and when to brush them off as noise.

Example 1: VPIP%

Let's say someone has VPIPed 8/20 hands (40% VPIP). Is this a big enough sample to confidently say they're a whale? (6-max)

Very informative. Thank you!


by Mind-Zei k

Maniacs for example often don't need more than 10 hands to be identified.
But postflop statistics can be very difficult to get to converge even for player with 200+ hands.
Unless they are maniacs and their postflop stats either are 10- or 80+.

Good thoughts, thanks for the reply 👍


by tombos21 k

You can use math to estimate the statistical confidence of HUD stats.

This is pretty much what I was looking for, a mathematical formula that could be used to determine what sample size we need to be sure that according to a specific stat the opponent is deviating. This looks very interesting, thank you!
After reading the examples and explanations, I tried to describe the input and output parameters so that we can use them more clearly in our case, and also made some examples.

by tombos21 k

Here's the tool I use to estimate HUD stats: https://homepage.divms.uiowa.edu/~mb...p...

So, using this calculator, we can represent the input data as follows:
n = number of samples p = required stat value
x = current stat value in % (number of successes)

Example 1 (Confidence interval using first example w vpip 8/20 hands):

Spoiler
Show



So, my interpretation of Confidence interval at that case:
1) If the real stat value goes beyond the Ci limits (19-64%), then we can be 95%+ sure (from 95 to 100%) that the stat value deviates from the real one (confident).
2) If the real stat value does not go beyond the boundaries of Ci (19-64%), then we can be 5% sure (from 5% to 50%) that the stat value deviates from the real one (not confident).

Example 2 (The stat value is included in the range of 19-64%)

Spoiler
Show



The probability that the current stat value is 25%, having a current stat value of 40% on a sample of 20 samples = 10% chance, i.e. we can be 90% sure that his real stat value deviates from 25%.

Example 3 (The stat value is not included in the range of 19-64%)

Spoiler
Show



The probability that the current stat value is 12%, having a current stat value of 40% on a sample of 20 samples = 0.14% chance, i.e. we can be 99.86% sure that his real stat value deviates from 12%.

So, How many samples do you need to confidently say that your opponent is deviating?
For example, we see Fold vs Delay 50%, and the optimal one is, for example, 40%.
How many samples do we need to be 95%+ sure that our opponent will overfold?
Next, I tested various samples to find the desired value.

Test 1

Spoiler
Show



The probability that the real stat value is 40% on 20 samples = 24%, so we can be 76% sure that his real stat value deviates (not confident).

Test 2

Spoiler
Show



The probability that the real stat value is 40% on 20 samples = 9.8%, so we can be 90.2% sure that his real stat value is deviating (more confident).

Test 3

Spoiler
Show



The probability that the real stat value is 40% on 20 samples = 2.7%, so we can be 97.3% sure that his real stat value is deviating (sure).

Test 4 (large number of samples)

Spoiler
Show



The probability that the real stat value is 40% on 150 samples = 0.827%%, so we can be 99.173% sure that his real stat value is deviating (very confident).

So we can use Confidence interval this way:
n = number of samples that we are looking for (testing)
x = frequency we are testing
We look at CI and when the required stat value (p) is not in the interval, then this number of samples suits us (For X% confidence that the stat deviates).

In general, did I understand everything correctly or did I make a mistake somewhere?
And several questions remain: As a result, we only find any confidence that the actual stat value deviates from the required one, but not how much the current stat value ultimately deviates, right? For example, in test 4, we are only 99% sure that his real stat value is deviating from the required one, but it can deviate by 3% (which will not lead to big adjustments) or by 10-20% (which will lead to big adjustments). So we need, at a minimum, to determine what value of confidence that the stat is rejected we need. I would guess 95% is good, 99%+ is perfect. What do you think?


RainDeath6, your process is good. Experimenting with the parameters to see how that changes the output is the right way to build intuition.


by tombos21 k

RainDeath6, your process is good. Experimenting with the parameters to see how that changes the output is the right way to build intuition.

Cool, it’s good that I understood everything correctly, I will experiment in this direction. Thanks a lot!
But, what do you think about last questions? I'll duplicate:
As a result of using confidence interval and Binomial Distribution, we only find any confidence that the actual stat value deviates from the required one, but not how much the current stat value ultimately deviates, right? For example, in test 4, we are only 99% sure that his real stat value is deviating from the required one, but it can deviate by 3% (which will not lead to big adjustments) or by 10-20% (which will lead to big adjustments). So we need, at a minimum, to determine what value of confidence that the stat is rejected we need, correct? I would guess 95%+ is good, 99%+ is perfect. What do you think?


I've shown this process to many poker players. The binomial calculator I've linked above is a powerful tool, but it's hard for non-stats people to use and lacks a visual component. As a layman without stats training, I struggled immensely when first researching this stuff.

I wanted a better method for interpreting HUD stats. Something that's easy for poker players to use, with a nice data visualization to make it intuitive to understand. This thread inspired me to build a spreadsheet tool to help interpret HUD stats. The easiest way to do this is to convert the binomial stat from discrete to continuous space using a tool called the beta distribution.

To use this tool:

  • Select File -> Make a copy so you can edit it
  • Enter data into grey boxes

Here's an example:

Villain has VPIP=30% after 50 hands. What does the probability distribution of their true HUD stat look like?

Most of the time (1 standard deviation), their true HUD stat will fall between [23% - 38%]. However, if you want to be very confident (2 SD, AKA 95%), then you need to extend the range to [18% - 45%].

Here's a visualization of their HUD stat.

  • x axis represents the HUD stat
  • y axis represents the density, or likelihood, of that HUD stat.
  • A wider curve means more uncertainty. A narrow curve suggests we are more confident about the HUD stat.


We can also run a test. Let's assume a good player would VPIP 25% or less. What is the probability that they are actually a reg with a sub25% VPIP?


  • Red area = probability that their true HUD stat is <= 25%
  • Blue area = probability that their true HUD stat is > 25%

The red area accounts for about 18.5% of the area of the curve, meaning there's an 18.5% chance that this is actually a "reg".

Example 2:

Let's increase the sample size from 50 to 100. So this player has VPIP'd 30% over 100 hands.

Now we can see that the curve has become more narrow, and the confidence intervals have shortened, so we can be more confident about their actual HUD stat.

The probability that they are actually a sub25% VPIP "Reg" has gone down to 11.5%. So you can more confidently deviate against them as you gain more data.


Example 3 after 1000 hands:


You can play around with the graph and see how it changes as plug in different parameters. This should hopefully provide a more intuitive way to interpret the data!

(reminder: opportunities ≠ hands played)

Technical notes

Spoiler
Show

You'll get different results with this method. In particular, the Beta distribution tends to converge more quickly at low samples compared to the method I showed earlier, but it's less accurate. The reason for this is because of the transformation from a discrete distribution to a continuous one. They converge to the same numbers over a big enough sample.

Binomial Distribution:

50% VPIP after 10 hands:

The probability distribution looks like this:


x axis: number of pots entered
y axis: probability of entering exactly x pots
Blue: Full probability distribution after 10 hands
Red: Probability of a 50% HUD stat entering 0, 1, 2, or 3 pots after 10 hands

As you can see, everything is modeled with these discrete 10% steps. If we sum up the area of the red rectangles we'd find the probability of entering 0-3 pots after 10 hands.

Beta Distribution:

Note that it's the same question, but we lose information about those discrete 10% steps. Instead, it's modelled as a continuous distribution. So we end up with a a different estimate for the red area.



by RainDeath6 k

Cool, it’s good that I understood everything correctly, I will experiment in this direction. Thanks a lot!
But, what do you think about last questions? I'll duplicate:
As a result of using confidence interval and Binomial Distribution, we only find any confidence that the actual stat value deviates from the required one, but not how much the current stat value ultimately deviates, right? For example, in test 4, we are only 99% sure that his real stat value is deviating from the required one,

The level of certainty depends on the range of the stat, what you consider to be normal, above average or below average for that particular stat.

For example, a 10% change to someones 3-bet% can swing your perception of a player from nit to maniac, so you need a lot more confidence about the 3-bet% because your strategic adjustments are more sensitive to small changes.
Conversely, you wouldn't alter your strategy that much against someone who c-bets 50% compared to 60%, so you don't need as much certainty because your strategic adjustments are less sensitive to changes in this value.

UYltimate I recommend being flexible. Instead of a hard "confident/not confident" cutoff, I recommend deviating more or less depending on your level of confidence. If your confidence is low, then make small or no adjustments. If your confidence is much higher, than you can make bigger adjustments.


by tombos21 k

I wanted a better method for interpreting HUD stats. Something that's easy for poker players to use, with a nice data visualization to make it intuitive to understand. This thread inspired me to build a spreadsheet tool to help interpret HUD stats. The easiest way to do this is to convert the binomial stat from discrete to continuous space using a tool called the beta distribution.

The beta distribution method also looks very interesting, I’ll definitely experiment with it too, I’ve already made a copy of the spreadsheet 😀


by tombos21 k

The level of certainty depends on the range of the stat, what you consider to be normal, above average or below average for that particular stat.
....
UYltimate I recommend being flexible. Instead of a hard "confident/not confident" cutoff, I recommend deviating more or less depending on your level of confidence. If your confidence is low, then make small or no adjustments. If your confidence is much higher, than you can make bigger adjustments.

Seems like a very smart answer, thanks. I'll look into all this I hope I will come to confident conclusions on this topic. I’m glad that there is a person on the forum who is also interested in understanding all this in detail and he succeeds. Thank you very much for the answers again 😀


by tombos21 k

The easiest way to do this is to convert the binomial stat from discrete to continuous space using a tool called the beta distribution.

I am working on beta distribution, I watched several videos on this topic, but I would like to ask one question: Do I understand correctly that in example 1 the density is determined by the probability density function (PDF) and it is more likely that with the current stat value of 30% (50 samples) the value of this stat will be exactly 30% (not 100%, but more likely)? Right? This is the so-called “probability of probability”. Because if you look at the PDF value with a stat value of 30%, the PDF value will be close to the highest point and in this case the PDF will be 6.24:

Spoiler
Show


And if in the same example we look at what the PDF will be like if the PDF decreases or increases by 0.4%, then it will decrease evenly:

Spoiler
Show

PDF=30.4


Spoiler
Show

PDF=29.6


Also, as the sample increases, for example to 100 samples, the PDF value at the 30% point (the maximum PDF value) also increases, which will proportionally increase the probability that the most likely value in this case will be 30%:

Spoiler
Show


I understand that the answer is most likely “yes”, I immediately thought so, but as I delved deeper into the theory, my confidence dropped a little bit.


Higher density indicates a higher likelihood of that being the "true stat value, " given the evidence so far. The highest point (most dense) will always equal the current stat value because we have no prior beliefs about what the stat "ought to be," so it's just distributed around the current value.

(I'm putting "true stat value" in quotations because there's a technicality here about what Frequentist Distributions are actually measuring, but I've already covered that at the end of post #5. In any case, it doesn't matter if you have no priors.)

The way to interpret "density" is to think of it as a multiplier of the stat. If you sum the area under the entire curve (take the integral), it adds to 100%. Measuring the area under only some section of the curve adds up to the probability that their stat is within that section. That's what the red part of the chart is measuring.

The more samples you have, the narrower the curve, indicating you are more confident in the actual value. The wider the curve, the less confident you are about its value.

It looks like you are also examining the slope of the curve, e.g., how much density changes with respect to the x-axis. The slope depends on what part of the curve you measure and your confidence. The exact formula is given by taking the derivative of the beta distribution, which is pure nightmare fuel. The beta distribution is close to, but not exactly equal to the normal distribution.


by tombos21 k

Higher density indicates a higher likelihood of that being the "true stat value, " given the evidence so far. The highest point (most dense) will always equal the current stat value because we have no prior beliefs about what the stat "ought to be," so it's just distributed around the current value.

(I'm putting "true stat value" in quotations because there's a technicality here about what Frequentist Distributions are actually measuring, but I've already covered that at the end of post #5. In an

Everything looks very logical, from this angle it looks more understandable. Thanks for the help!

Reply...