Probability question: observed win rate above benchmark in a truncated 1–2 process
Hello,
I hope this is the correct section for this question.
Also, apologies in advance for my English, because it is not my native language.
I am not posting this as a “gambling system sales pitch”, but as a probability / statistical interpretation question.
I have been analyzing a filtered process derived from a large sample of live Baccarat shoe data, but I would like to focus only on the probabilistic side of the problem.
Simplified setup
Consider a repeated process where each “cycle” has:
a win outcome worth approximately +0.975
or a full loss capped at -3
the progression is strictly truncated to 2 steps (1–2)
This creates a rough random baseline where the break-even win rate is approximately:

Now suppose that after applying a fixed set of entry filters, I observe the following out of 4,831 cycles:
Wins: 3,719
Losses: 1,112
Observed win rate: 76.98%
Net result: +578.63 units
Max drawdown: -35.57 units
My current interpretation
Relative to the rough random baseline of 75.47%, the observed process is ahead by about:
+1.51 percentage points
roughly +96 extra wins versus expectation
Using a simplified binomial approximation, this gives me something around:
Z ≈ 3.1 to 3.2
So my question is not whether this “beats Baccarat”, but whether this deviation should be taken seriously from a probabilistic point of view.
My real questions
Is this baseline / null hypothesis even the right one to use here,
or is it too simplified to say anything meaningful?
If the filters were originally developed on historical data,
how much should I discount this apparent significance because of:
selection bias
data snooping
multiple testing
overfitting
Does the very low max drawdown relative to total cycles
add any useful information here,
or is it mostly irrelevant once selection bias is admitted?
If you saw this result in isolation,
would you consider it:
statistically interesting,
or still fully compatible with noise plus model selection?
Important clarification
I am fully aware that:
any “edge” found in historical gambling data can disappear,
and that out-of-sample validation is the real test.
I am only trying to understand whether the observed deviation from the benchmark is even worth respecting, or whether I am likely fooling myself mathematically.
Any technical criticism is welcome.
Thanks in advance.