This is the second in a short series of posts about the Annual Computer Poker Competition (ACPC) taking place at the AAAI conference in Toronto July 22-26, 2012. My name is Richard Gibson and I’m a member of the Computer Poker Research Group (CPRG) at the University of Alberta. In my previous post, I discussed the history and present state of the competition, as well as the six events currently being played. Here, I will talk about our programs from previous years and explain our new programs for this year’s competition.
The CPRG’s poker programs, named “Hyperborean” in the ACPC, are constructed quite differently compared to programs for “perfect information games” such as chess. For example, the chess program Deep Blue was based on a technique called alpha-beta search, which compares sequences of moves until one sequence is found to be superior. On Deep Blue’s turn to move, search was performed on-line from the current game state for up to several minutes before the best estimated move was taken. In contrast, during a match, our poker programs play instantaneously. All computation for decision-making is performed off-line for several days before the competition and a final strategy profile is written to disk. Then, during an actual match, actions are chosen simply through table lookups in the precomputed profile.
For the heads-up (two-player) instant-runoff events, the CPRG’s approach, as well as the approach taken by several other teams, has been to compute an approximate Nash equilibrium strategy profile. One can think of a Nash equilibrium as a perfect defensive strategy; that is, in a fair 2-player game, the best an opponent can do in the long run against an equilibrium strategy is tie. Since the most losing player is sequentially eliminated in the instant-runoff divisions of the ACPC, playing a Nash equilibrium strategy here would be optimal. However, designing a program that can play Texas Hold’ Em at an exact equilibrium is currently infeasible due to the game’s massive size. The smallest game played at the ACPC, heads-up limit (fixed bet sizes) Texas Hold ’Em, has approximately 1018 game states.
To compute approximate Nash equilibria, the CPRG uses Counterfactual Regret Minimization [pdf], a state-of-the-art iterative algorithm that closely resembles a self-play procedure. As CFR’s memory requirements are linear in the size of the game, we cannot apply CFR to full Texas Hold ’Em. Instead, we run CFR on a smaller, abstract game that we construct by merging similar card dealings into “buckets.” This makes hands that fall into the same bucket indistinguishable to the program [pdf]. For example, a pair of aces and a pair of kings are both strong hands and one might want to play them similarly, and so the two hands might be combined into a single bucket.
In no-limit games, actions must also be abstracted. Rather than being able to raise any number of chips, only a few bet sizes, such as “pot” (a raise equal to the number of chips in the pot) or “all-in” (bet all remaining chips), are allowed by our programs to keep the game size manageable. An abstract game equilibrium can then be computed with CFR and used to play the real game. Unfortunately, even in limit games, abstraction provides no guarantee [pdf] for how far the generated profile is from equilibrium in the real game. In practice, however, well-designed abstractions with many buckets tend to yield the programs closest to a real equilibrium [pdf].
We also use a modified version of CFR for the two-player games where the winner is determined by “total bankroll.” Modifications tend to change from year to year in an attempt to improve our program's chances of taking more money from weaker players. In addition, our 3-player entries are also built using CFR, even though CFR no longer guarantees an approximate Nash equilibrium with the addition of a third player.
While there are a number of other teams that also use CFR or other equilibrium-finding algorithms, some teams have used different techniques to succeed in the competition. For example, the programs named “Sartre” from the University of Auckland use case-based reasoning [pdf] derived from hand history data of ACPC results from previous years. At each turn to act, Sartre searches for scenarios similar to its current situation and picks an action that was most successful in the previous scenarios.
For the 2012 competition, we incorporated some exciting new techniques from our recent research into our programs. While our instant-runoff entry for heads-up limit is a traditional approximate equilibrium computed with CFR, our total bankroll program for heads-up limit is quite different. Rather than using a single strategy profile, we instead dynamically generate a strategy profile on each hand by mixing a small collection or “portfolio” of profiles. Estimates of the expected reward for each of the portfolio’s profiles are used to generate the mixture.
For the no-limit and 3-player games, we are submitting just one program per game to play in both the instant-runoff and total bankroll divisions. In no-limit, we used a new algorithm to allow a different bet size [pdf] at every decision point in conjunction with the fold, call, and all-in actions. Many no-limit poker programs only properly understand a small number of traditional bet sizes, such as “pot,” but our new program frequently uses non-standard bets. Because of this, we found that the new program, at least when played against our previous no-limit entries, won a lot of chips by pushing its opponent “off-tree.”
In the last blog of this series, I will discuss the outcome of this year's computer poker competition and report on the Poker Symposium that was introduced at AAAI this year.
Image: Rubén Hidalgo / iStockphoto