The October 2022 issue of IEEE Spectrum is here!

Close bar

Meet Hyperborean, the Poker-Playing AI

What strategies help a computer program win at Texas Hold ’Em?

4 min read
Computer keyboard new key with poker icon.
Image: Rubén Hidalgo / iStockphoto

Computer keyboard new key with poker icon.Image: Rubén Hidalgo / iStockphoto

This is the second in a short series of posts about the Annual Computer Poker Competition (ACPC) taking place at the AAAI conference in Toronto July 22-26, 2012. My name is Richard Gibson and I’m a member of the Computer Poker Research Group (CPRG) at the University of Alberta. In my previous post, I discussed the history and present state of the competition, as well as the six events currently being played. Here, I will talk about our programs from previous years and explain our new programs for this year’s competition.

The CPRG’s poker programs, named “Hyperborean” in the ACPC, are constructed quite differently compared to programs for “perfect information games” such as chess. For example, the chess program Deep Blue was based on a technique called alpha-beta search, which compares sequences of moves until one sequence is found to be superior. On Deep Blue’s turn to move, search was performed on-line from the current game state for up to several minutes before the best estimated move was taken. In contrast, during a match, our poker programs play instantaneously. All computation for decision-making is performed off-line for several days before the competition and a final strategy profile is written to disk. Then, during an actual match, actions are chosen simply through table lookups in the precomputed profile.

For the heads-up (two-player) instant-runoff events, the CPRG’s approach, as well as the approach taken by several other teams, has been to compute an approximate Nash equilibrium strategy profile. One can think of a Nash equilibrium as a perfect defensive strategy; that is, in a fair 2-player game, the best an opponent can do in the long run against an equilibrium strategy is tie. Since the most losing player is sequentially eliminated in the instant-runoff divisions of the ACPC, playing a Nash equilibrium strategy here would be optimal. However, designing a program that can play Texas Hold’ Em at an exact equilibrium is currently infeasible due to the game’s massive size. The smallest game played at the ACPC, heads-up limit (fixed bet sizes) Texas Hold ’Em, has approximately 1018 game states.

To compute approximate Nash equilibria, the CPRG uses Counterfactual Regret Minimization [PDF], a state-of-the-art iterative algorithm that closely resembles a self-play procedure. As CFR’s memory requirements are linear in the size of the game, we cannot apply CFR to full Texas Hold ’Em. Instead, we run CFR on a smaller, abstract game that we construct by merging similar card dealings into “buckets.” This makes hands that fall into the same bucket indistinguishable to the program [PDF]. For example, a pair of aces and a pair of kings are both strong hands and one might want to play them similarly, and so the two hands might be combined into a single bucket.

In no-limit games, actions must also be abstracted. Rather than being able to raise any number of chips, only a few bet sizes, such as “pot” (a raise equal to the number of chips in the pot) or “all-in” (bet all remaining chips), are allowed by our programs to keep the game size manageable. An abstract game equilibrium can then be computed with CFR and used to play the real game. Unfortunately, even in limit games, abstraction provides no guarantee [PDF] for how far the generated profile is from equilibrium in the real game. In practice, however, well-designed abstractions with many buckets tend to yield the programs closest to a real equilibrium [PDF].

We also use a modified version of CFR for the two-player games where the winner is determined by “total bankroll.” Modifications tend to change from year to year in an attempt to improve our program’s chances of taking more money from weaker players. In addition, our 3-player entries are also built using CFR, even though CFR no longer guarantees an approximate Nash equilibrium with the addition of a third player.

While there are a number of other teams that also use CFR or other equilibrium-finding algorithms, some teams have used different techniques to succeed in the competition. For example, the programs named “Sartre” from the University of Auckland use case-based reasoning [PDF] derived from hand history data of ACPC results from previous years. At each turn to act, Sartre searches for scenarios similar to its current situation and picks an action that was most successful in the previous scenarios.

For the 2012 competition, we incorporated some exciting new techniques from our recent research into our programs. While our instant-runoff entry for heads-up limit is a traditional approximate equilibrium computed with CFR, our total bankroll program for heads-up limit is quite different. Rather than using a single strategy profile, we instead dynamically generate a strategy profile on each hand by mixing a small collection or “portfolio” of profiles. Estimates of the expected reward for each of the portfolio’s profiles are used to generate the mixture.

For the no-limit and 3-player games, we are submitting just one program per game to play in both the instant-runoff and total bankroll divisions. In no-limit, we used a new algorithm to allow a different bet size [PDF] at every decision point in conjunction with the fold, call, and all-in actions. Many no-limit poker programs only properly understand a small number of traditional bet sizes, such as “pot,” but our new program frequently uses non-standard bets. Because of this, we found that the new program, at least when played against our previous no-limit entries, won a lot of chips by pushing its opponent “off-tree.”

In the last blog of this series, I will discuss the outcome of this year's computer poker competition and report on the Poker Symposium that was introduced at AAAI this year.

The Conversation (0)

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less
{"imageShortcodeIds":["30133857"]}