Validity of the Poisson Distribution

Validity of the Poisson Distribution

A question regarding the Poisson distribution.

Citing Feustel in Conquering Risk the Poisson distribution can be used in a "good enough" sense if the number of trials divided by the probability of success is at least 200. This number, I assume, comes from Feustel's note that the Poisson can be used when the number of trials is at least 20 and the odds of success is no more than 10%.

In analyzing MLB hits per game, I have found that a player will see about 11 pitches and get a hit off 4% of those pitches. 11 divided by 4% yields yields 275, which is over that 200 line.

Based on these numbers and Feustel's rule of thumb, I believe the Poisson distribution can be used to estimate MLB player hits per game.

Criticism and correction of this reasoning very much appreciated.

Thanks, all.

18 February 2024 at 10:24 PM
Reply...

4 Replies



As Poission approximates a binomial distribution, it only works if the events are independent.

I don't think pitches are independent, but for your model maybe they are independent enough. For example, if a player whiffs on the first x pitches does that make it more likely the next pitch is a whiff?


by PokerHero77 k

As Poission approximates a binomial distribution, it only works if the events are independent.

I don't think pitches are independent, but for your model maybe they are independent enough. For example, if a player whiffs on the first x pitches does that make it more likely the next pitch is a whiff?

I would agree with this. In actuality, yes, there are a very large number of variables that make each pitch dependent on the last. However, as the saying goes, all models are wrong, but some are useful. The actual computation power to model how many hits a player would get given the number of variables to a "near perfect" would be computationally intractable. But, I would argue that given the assumptions outlined, Poisson is "good enough," assuming one is aware of the limitations.


I suggest you use the projected distribution and see how that compares empirically with data from the past few years, where hit rate/pitch is reasonably constant. I suspect the model will under shoot in games with more pitches thrown, and over shoot with the opposite condition.

Another problem with your model is using a non-binomial stat (pitch) and applying a binomial property to it (hit or no hit). In fact a pitch can result in a set of outcomes, some resembling a hit and others not resembling a hit at all. Perhaps it would be worthwhile to construct poissons for each of the possible hit results and use accordingly.


I am assuming you would be using an interval closely approximating the # of pitches in a game expected for each side.

Reply...