How many hands for comprehensive RNG test?
Hey, I know nobody really posts in this section. If you see this and can advise me on the proper amount of consecutive hands, please do so.
Let's say we're using a poker site and we want to get to a 95% confidence level regarding the RNG.
I thought 10k hands would be enough; but from what I'm reading; it should be closer to 100k random hands dealt.
Does anyone know what the minimum amount of hands dealt should be for a 95% confidence rating?
1 Reply
That is actually a very good and interesting question. It also is one that is probably more complicated than you might think. If gets to the concept of the power of a statistical inference test. Power is exactly the notion you are looking for - how good is this test at distinguishing a result that is just the result of typical variance from one caused by a nonrandom factor.
The complicated part is that there is no one right answer to the question. It will depend both on specifically how you intend to test the RNG and by how much the observed result deviates from the expected result from a truly random RNG.
The first of these refers to the fact that “test the RNG” is too vague. We need to come up with a particular, specific hypothesis regarding the RNG that we can use observed data to test. For example “the RNG gives AA at a frequency different from expected” or “The RNG causes aces to be dealt at a lower than expected frequency”. Different tests will require different amounts of data to give significant results.
A statistical test works by taking a hypothesis such as one of the ones posted above and assuming the null hypothesis, essentially the opposite of what we are testing. For example the null hypothesis for the AA example above would be that the RNG deals AA at the expected frequency (1/221 hands). We then look at the data and calculate the probability that we would get a deviation from the expected frequency at least as great as that observed assuming that the null hypothesis is true. We reject the null hypothesis if the probability is small enough. Well, this matters to our power - larger deviations from expected require smaller amounts of data to reject the null hypothesis. For very small data sets we should be careful of this. Obviously for N=1 we have a 1/221 chance of getting AA. We donÂ’t conclude, though that our RNG is faulty because we got AA on a given hand. If we looked at 1000 hands though and saw that we got AA 600 times, that would be more than enough data to reject the null hypothesis.
The pitfalls you will want to be aware of if you actually want to test a given RNG are these:
1. Make sure you are using a clear, testable hypothesis. “It just seems wrong” is not such a hypothesis.
2. When formulating a hypothesis make sure you test correctly. The three hypotheses “It deals more hands with an ace than expected”, “It deals fewer hands with an ace than expected” and “the frequency of hands with an ace it deals is different than expected” all seem similar. However proper statistical testing will give different probabilities for each of these using the exact same data set. Also make sure you are calculating probabilities correctly. In this case, it’s easy to assume that the expected frequency of aces being dealt would be 2/13, but this is wrong. The correct value is 33/221.
3. There are many possible ways to test an RNG. There are 13 card rank distributions. There are 52 individual card distributions. There are 1326 different combination distributions. (These would count Ad As and AhAs as different). There are 91 different hand distributions (these count all AA combos as the same, similarly for other hands), and many other distributions you could test. Be careful using 95% as a significance level if you do make multiple such tests. Remember what that 95% means; it means that a deviation as large as that observed will occur randomly with a probability of 5%. If you made 100 different tests of an RNG that is truly random, you would expect to see about 5 of them to give a “significant” deviation from randomness using 95% as a confidence level. Use caution, and a more stringent significance level, if you do perform multiple tests.
4. Finally beware of the difference between statistically significant and practically significant. For example the expected frequency of being dealt AA is 0.452489%. If you did a test using say 10 billion hands, you might find that an observed frequency of 0.452491% comes out to be statistically significant. Remember that all this means is that the probability of an observed deviation of at least 0.000002% over this many hands occurring randomly is lower than 5%. This level of deviation, though would have no practical effect on any aspect of game play, nor would it make a noticeable difference in anybody’s win rate. A deviation that small is zero in practical terms.