Is this the Correct Algorithm for Vanilla CFR?
Is this the Correct Algorithm for Vanilla CFR?

Is this the Correct Algorithm for Vanilla CFR?

I'm trying to build a solver, starting with only the river and have got the program to correctly build a gametree and calculate EVs and reach probabilities. But the output of the solver is wrong so I think there's a mistake in my CFR algorithm. If someone with the right knowledge can notice anything wrong with the below algorithm would be very grateful (can provide the code too if needed):

initialise each possible action to be taken an equal frequency for every hand at every node, and all cumulative regrets to 0

loop for maximum iterations or until target exploitability reached:
- update the reach probabilities and calculate the EVs for every hand in every node
- for each hand at each node:
- calculate expected utility = sum of (each action EV * frequency that action taken)
- set regrets of each action = EV of that action - expected utility
- add these regrets to cumulative regrets
- calculate positive regrets = max(r, 0) for each regret
- if sum of positive regrets is 0, set each action to an equal frequency
- otherwise, set the action frequency to their positive regret / sum of positive regrets

- output the average strategy across all iterations

15 April 2025 at 03:50 PM
Reply...

4 Replies



I'm not sure to exactly understand what you mean by "- update the reach probabilities and calculate the EVs for every hand in every node"
Anyway, the pseudo code of CFR should look like this :

- Initialize a uniform strategy for each decision point (node).

For each iteration:

- Sample private hands for each player according to the defined card distribution (chance sampling).
- Traverse the game tree: compute the utility for each terminal node (i.e., when the game ends).
- Backpropagate the utilities to calculate the expected utilities at internal nodes using counterfactual values.
- Compute regrets for each [information set ; action ; hand], based on the difference between the action taken and the politics of the tree.
- Accumulate regrets over time (iteration after iteration) for each [information set ; action ; hand]
- Update strategies (policies) using regret-matching, based on the accumulated regrets.

I've written an in-depth article about this AI in poker here if you mind :


by Serge NAKACHE m

I'm not sure to exactly understand what you mean by "- update the reach probabilities and calculate the EVs for every hand in every node"Anyway, the pseudo code of CFR should look like this : - Initialize a uniform strategy for each decision point (node).For each iteration:- Sample private hands for each player according to the defined card distribution (chance sampling).- Trav

I appreciate the response, and that article looks really good. So my main mistake was trying to run each iteration with the whole range of hands, rather than sampling one hand per player?


by Fishreg1 m

I appreciate the response, and that article looks really good. So my main mistake was trying to run each iteration with the whole range of hands, rather than sampling one hand per player?

I think that's an implementation choice.

Monte Carlo CFR samples players hands (this method is often used in preflop solvers). Vanilla CFR updates all the ranges at once.


by tombos21 m

I think that's an implementation choice.

Monte Carlo CFR samples players hands (this method is often used in preflop solvers). Vanilla CFR updates all the ranges at once.

Got it, thanks.

I found the problem was from not realising you have to weight the new regrets by the probability the opponent takes actions to reach that point before adding to the cumulative regrets, so all fixed now

Reply...