Noam Brown on GTO Strategies in tournaments.
Considering the current prevalence of ICM I thought it was interesting to hear Noam say that computing GTO solutions within tournaments is possible. He doesn't go into much details but it involves modelling the tournaments through self play within the same kind of system that Libratus used only you add the extra layers involved within tournaments. I know back in 2017 when he was developing his bot the compute to do this sort of thing probably wasn't available or at least would be very, very expensive but given how that has increased and come down in cost in the years that have followed its surprising we haven't seen better model appear for tournaments.
Noam - "The modelling ends up not being a big problem in practice when you use neutral nets and we know how to address it"
Timestamped - https://youtu.be/ceCg90Q9N6Y?si=F6TvdEyu...
Am I just out of the loop and people are using these methods?
11 Replies
Cash games chips = $
Tournament games chips != $
Yes, currently available software utilizes non-cEV modeling for the suggested "solutions" given the human inputs.
It has also been known since way before solvers that "ICM" has always been in play from the first hand that you play in tournaments. I seem to remember years ago Phil Hellmuth saying that he wouldn't take an aipf spot with AA HU vs another player on the first hand of a tournament implying that would be giving up EV in doing so... you can't make a statement like that without having an understanding of the asymmetrical dollar value of chips for the entirety of the tournament. White Magic or something like that
Cash games chips = $
Tournament games chips != $
Yes, currently available software utilizes non-cEV modeling for the suggested "solutions" given the human inputs.
It has also been known since way before solvers that "ICM" has always been in play from the first hand that you play in tournaments. I seem to remember years ago Phil Hellmuth saying that he wouldn't take an aipf spot with AA HU vs another player on the first hand of a tournament implying that would be giving up EV in doing so... you can
I think he specifically said this for the main event bc it's so soft.
Cash games chips = $
Tournament games chips != $
Yes, currently available software utilizes non-cEV modeling for the suggested "solutions" given the human inputs.
It has also been known since way before solvers that "ICM" has always been in play from the first hand that you play in tournaments. I seem to remember years ago Phil Hellmuth saying that he wouldn't take an aipf spot with AA HU vs another player on the first hand of a tournament implying that would be giving up EV in doing so... you can
I know we have different ways of looking at tournaments. Form my understanding most of it is doing some ICM adjusted black magic. ICM was first developed back in the 80's for horseracing after which it was adapted for poker.
I am asking have we seen any new approaches to non-cEV modeling similar to what Noam talks about in the video.
Will his computer program take into account things like someone at another table being down to two big blinds at another table when you are one out of the money with a small stack and are facing a a bet that puts you all in?
Will his computer program take into account things like someone at another table being down to two big blinds at another table when you are one out of the money with a small stack and are facing a a bet that puts you all in?
That is data that needs to be modelled within the system. I don't think modelling that type of information would be particularly hard if the system was purely looking at one decision static within a tournament on what a person should do at any given point. The problem is to give this some kind of meaning you need to be able to model it within the context of the whole tournament which given what I know about these methods would mean an exponential blow up in the tree. He doesn't explain in any great detail how you go about solving tournaments other than to say with current methods/compute that it should be possible. It could be a flippant comment where he hasn't thought too hard about the problem. He currently works at OpenAI on reasoning in LLM's.
I would have thought we could at least model late stage tournaments using newer techniques given the later you model the less the tree should expand and its where most of the dollars are being won/lost.
That is data that needs to be modelled within the system. I don't think modelling that type of information would be particularly hard if the system was purely looking at one decision static within a tournament on what a person should do at any given point. The problem is to give this some kind of meaning you need to be able to model it within the context of the whole tournament which given what I know about these methods would mean an exponential blow up in the tree. He doesn't explain in any gr
Since every strategy decision depends on the combination of different stack sizes among the players who have already folded as well as those left to act and the number of combinations is easily a quintillion, even when it's down to the final table, it shouldn't be necessary to "think hard" to realize that coming up with precise GTO strategies would require a lot more computer time. This reminds me of when Edward Thorp, when asked why the basic strategy for decks without tens or pictures gives the player and edge, essentially replied "because that's what the computer said" rather than instantly realizing that it must be because you often (actually always) hit hard 17.
Since every strategy decision depends on the combination of different stack sizes among the players who have already folded as well as those left to act and the number of combinations is easily a quintillion, even when it's down to the final table, it shouldn't be necessary to "think hard" to realize that coming up with precise GTO strategies would require a lot more computer time. This reminds me of when Edward Thorp, when asked why the basic strategy for decks without tens or pictures gives th
Sorry, that is my fault with GTO strategies in the title. The conversation is around solving towards that but obviously modelling it exactly is very difficult, I think what he means is that the system could achieve superhuman performance over current methods. How far that is from optimal could still be some distance.
Sorry, that is my fault with GTO strategies in the title. The conversation is around solving towards that but obviously modelling it exactly is very difficult, I think what he means is that the system could achieve superhuman performance over current methods. How far that is from optimal could still be some distance.
If the tournament (or even a multiplayer cash game) includes many bad players, and a human expert, the performance will not be as good as the expert's. (The six handed previous demonstrations did not include "live ones" as far as I know.)
If the tournament (or even a multiplayer cash game) includes many bad players, and a human expert, the performance will not be as good as the expert's. (The six handed previous demonstrations did not include "live ones" as far as I know.)
Who cares? What we would like to know is a theoretical baseline from which we can then deviate to take advantage. Even when we look at HU we know the system will perform worse than an expert human against a really bad player as it isn't trying to exploit them. Knowing some kind of theorical baseline though makes the job of figuring out the exploitation much easier.
They released multiple papers on the depth-limited solving approach along with lectures giving all the technical details. Just to note as well, Pluribus was made 5 years ago.
"The blueprint strategy for Pluribus was computed in 8 days on a 64-core server for a total of 12,400 CPU core hours. It required less than 512 GB of memory. At current cloud computing spot instance rates, this would cost about $144 to produce."
There have been enormous breakthroughs in the field along with huge increased computational ability since then.
Its funny, Noam said they never released the code to this bot because they thought it would do enormous damage to the poker ecosystem. If someone could have explained to him that all this does is enable a private market to gain advantage over a public one and the reasons this is bad, he may have made it available.
Watching an interview with Eric Steinberger which was only released a few hours ago. He is one of the founders of a company called Magic Dev . He says in the interview he worked on building a poker bot using the latest technology which he sold privately to investors. He believes it is still the best, muti-player, dynamic stack sizes, full ring agent for poker.
Stay safe out there kids.
Wanted to point out that if you do watch the interview above with him, the fact he created some winning poker bot should be a afterthought compared to his thoughts that AGI will happen within this decade. If we can make cheap, abundant, intelligence at scale like this then its hard to even know if money would even matter moving forward. It would change our world in ways that are impossible to comprehend.