Poker-Playing AIs Today, Skynet Tomorrow

Making computers unbeatable at Texas Hold 'em could lead to big breakthroughs in artificial intelligence

Life is not a game. But there are similarities. That’s why it’s worthwhile to invent artificial-intelligence algorithms that can win games. One such AI has now finally solved one of the simplest versions of poker. It’s a crucial first step on a potentially long road toward beating human poker champions in more complex versions of the games. This isn’t just about bragging rights: Poker playing can train computer algorithms to tackle the complexities of real-world challenges in security and medicine, where the available information is rarely perfect or complete.

The many possibilities for deception in poker means there are a huge number of possible plays, even in a limited version with just two players. Computers have solved simpler games such as Connect Four and checkers by figuring out the perfect, unbeatable strategy for each move starting from the beginning of each game. But analyzing all the possible plays in even the simplest poker game is a far greater challenge because each player has hidden cards—information hidden from the opponent. In that sense, poker is an “imperfect-information game,” similar to real-world scenarios with various degrees of uncertainty.

“The solutions for imperfect-information games require computers to handle the additional complication of not knowing exactly what the game’s state is, such as not knowing an opponent’s hand,” says Neil Burch, a computer scientist at the University of Alberta, in Edmonton, Canada. “Such techniques require more computer memory and computing power.”

Burch and his colleagues laid out their algorithm’s solution to “heads-up limit” Texas Hold ’em in the journal Science in January. In AI parlance, it’s only a “weak” solution to a specific version of poker. The game has just two players, fixed bet amounts, and a fixed number of bet raises. But the software is still good enough to make it virtually impossible—from a statistical significance standpoint—to distinguish the algorithm’s solution from perfect play during a lifetime of poker games.

The algorithm, named CFR+ by its creators, uses an improved version of a technique called counterfactual regret minimization (CFR). Regret-minimization algorithms [pdf] are easiest to understand for a single-step game, such as rock-paper-scissors. They compare the outcome of a game, say, how much money you lost, with the outcome of the best possible choice. The difference is the regret value. The algorithm then comes up with a strategy—expressed as the probability that you’ll make a certain move—for the next time you play that minimizes the total regret for all the times you’ve ever played. Counterfactual regret minimization extends the process to work for games like poker where there are many steps between the start and finish of the game. And CFR+ boosts the efficiency of the process by taking fewer and bigger steps toward the best solution.

Past CFR algorithms never tried solving the full game of heads-up limit Texas Hold ’em because of the huge amount of memory required—roughly 262 terabytes. But CFR+, helped along by some memory compression techniques, was efficient enough to solve the full game.

The next big challenge is trying to solve heads-up no-limit Texas Hold ’em, a more complex version that allows unrestricted bet sizes. Algorithms must consider many more information sets that each represent the possible moves made by opponents at each stage of the game. The difference in the number of information sets is huge—about 147 orders of magnitude. For that reason, researchers cannot hope for an algorithm that can sort through every single possible play. Instead, they fall back on using “abstractions”—simplified representations of the full game.

A Carnegie Mellon University group led by Tuomas Sandholm has developed a computer program capable of handling an abstract version of no-limit Texas Hold ’em that is six times as large as any previous abstraction of the poker game. In fact, it may be the largest imperfect-information game ever attempted by a computer.

Carnegie Mellon’s abstraction algorithm works by breaking the abstract game into smaller pieces and spreading them across different blade servers in the Blacklight supercomputer at the Pittsburgh Supercomputing Center—a 37-teraflops machine. One “parent” part of the abstraction exchanges information with the smaller pieces on other servers. That allows the team to create a much larger abstraction than if it was all located on one server. It’s also about three times as fast as spreading the abstraction across many servers that must all communicate with one another.

Once the CFR algorithm analyzes the abstract game and develops a poker-playing strategy, a reverse-mapping algorithm is needed to apply that strategy back to the full version of no-limit Texas Hold ’em. Sandholm’s group developed such an algorithm—called pseudo-harmonic mapping—which can reduce the possibility of exploits by opponents who take an action that is not covered by the abstraction.

The Sandholm group’s AI showed its worth in 2014 by beating the best rival poker-playing programs. It’s not a solution for no-limit Texas Hold ’em, but it’s still useful. “The algorithms we developed are not for solving poker; they’re for solving imperfect-information games in general,” Sandholm says. “Poker is a benchmark where we can test progress from year to year.”

This article originally appeared in print as “AIs vs. Poker.”

Advertisement
Advertisement