TwoPlusTwo

Is this the Correct Algorithm for Vanilla CFR?

centurion • 7 posts • Joined Mar 2023

1yr ago

I'm trying to build a solver, starting with only the river and have got the program to correctly build a gametree and calculate EVs and reach probabilities. But the output of the solver is wrong so I think there's a mistake in my CFR algorithm. If someone with the right knowledge can notice anything wrong with the below algorithm would be very grateful (can provide the code too if needed):

initialise each possible action to be taken an equal frequency for every hand at every node, and all cumulative regrets to 0

loop for maximum iterations or until target exploitability reached:
- update the reach probabilities and calculate the EVs for every hand in every node
- for each hand at each node:
- calculate expected utility = sum of (each action EV * frequency that action taken)
- set regrets of each action = EV of that action - expected utility
- add these regrets to cumulative regrets
- calculate positive regrets = max(r, 0) for each regret
- if sum of positive regrets is 0, set each action to an equal frequency
- otherwise, set the action frequency to their positive regret / sum of positive regrets

- output the average strategy across all iterations

15 April 2025 at 03:50 PM

Reply...

4 Replies

stranger • 15 posts • Joined Mar 2025

1yr ago

I'm not sure to exactly understand what you mean by "- update the reach probabilities and calculate the EVs for every hand in every node"
Anyway, the pseudo code of CFR should look like this :

- Initialize a uniform strategy for each decision point (node).

For each iteration:

- Sample private hands for each player according to the defined card distribution (chance sampling).
- Traverse the game tree: compute the utility for each terminal node (i.e., when the game ends).
- Backpropagate the utilities to calculate the expected utilities at internal nodes using counterfactual values.
- Compute regrets for each [information set ; action ; hand], based on the difference between the action taken and the politics of the tree.
- Accumulate regrets over time (iteration after iteration) for each [information set ; action ; hand]
- Update strategies (policies) using regret-matching, based on the accumulated regrets.

I've written an in-depth article about this AI in poker here if you mind :

https://pub.towardsai.net/rebel-the-ai-that-learned-to-bluff-775818ace0be

Medium

ReBeL, the AI That Learned to Bluff

1 like

centurion • 7 posts • Joined Mar 2023

1yr ago

by Serge NAKACHE m

I'm not sure to exactly understand what you mean by "- update the reach probabilities and calculate the EVs for every hand in every node"Anyway, the pseudo code of CFR should look like this : - Initialize a uniform strategy for each decision point (node).For each iteration:- Sample private hands for each player according to the defined card distribution (chance sampling).- Trav

I appreciate the response, and that article looks really good. So my main mistake was trying to run each iteration with the whole range of hands, rather than sampling one hand per player?

Head coach of GTO Wizard • 447 posts • Joined Sep 2018

1yr ago

by Fishreg1 m

I appreciate the response, and that article looks really good. So my main mistake was trying to run each iteration with the whole range of hands, rather than sampling one hand per player?

I think that's an implementation choice.

Monte Carlo CFR samples players hands (this method is often used in preflop solvers). Vanilla CFR updates all the ranges at once.

2 likes

centurion • 7 posts • Joined Mar 2023

1yr ago

by tombos21 m

I think that's an implementation choice.

Monte Carlo CFR samples players hands (this method is often used in preflop solvers). Vanilla CFR updates all the ranges at once.

Got it, thanks.

I found the problem was from not realising you have to weight the new regrets by the probability the opponent takes actions to reach that point before adding to the cumulative regrets, so all fixed now

$30M GTD Battle of Malta Online Series Join the Action on CoinPoker July 12 - Aug 3 $1M CoinMillion • $2.5M Main Event • $250k Added Value Daily Rewards

Reply...

Is this the Correct Algorithm for Vanilla CFR?

4 Replies

Search in Thread

About OP