A fresh Texas Hold’em-playing AI terror has emerged barely a month after a supercomputer-powered bot claimed victory over four professional poker players. But instead of relying on a supercomputer’s hardware, the DeepStack AI has shown how it too can decisively defeat human poker pros while running on a GPU chip equivalent to those found in gaming laptops.
The success of any poker-playing computer algorithm in heads-up, no-limit Texas Hold’em is no small feat. This version of two-player poker with unrestricted bet sizes has 10160 possible plays at different stages of the game—more than the number of atoms in the entire universe. But the Canadian and Czech reseachers who developed the new DeepStack algorithm leveraged deep learning technology to create the computer equivalent of intuition and reduce the possible future plays that needed to be calculated at any point in the game to just 107. That enabled DeepStack’s fairly humble computer chip to figure out its best move for each play within five seconds and handily beat poker professionals from all over the world.
“To make this practical, we only look ahead a few moves deep,” says Michael Bowling, a computer scientist and head of the Computer Poker Research Group at the University of Alberta in Edmonton, Canada. “Instead of playing from there, we use intuition to decide how to play.”
This is a huge deal beyond just bragging rights for an AI’s ability to beat the best human poker pros. AI that can handle complex poker games such as heads-up, no-limit Texas Hold’em could also tackle similarly complex real-world situations by making the best decisions in the midst of uncertainty. DeepStack’s poker-playing success while running on fairly standard computer hardware could make it much more practical for AI to tackle many other “imperfect-information” situations involving business negotiations, medical diagnoses and treatments, or even guiding military robots on patrol. Full details of the research are published in the 2 March 2017 online issue of the journal Science.
Imperfect-information games have represented daunting challenges for AI until recently because of the seemingly impossible computing resources required to crunch all the possible decisions. To avoid the computing bottleneck, most poker-playing AI have used abstraction techniques that combine similar plays and outcomes in an attempt to reduce the number of overall calculations needed. They solved for a simplified version of heads-up, no-limit Texas Hold’em instead of actually running through all the possible plays.
Such an approach has enabled AI to play complex games from a practical computing standpoint, but at the cost of having huge weaknesses in their abstracted strategies that human players can exploit. An analysis showed that four of the top AI competitors in the Annual Computer Poker Competition were beatable by more than 3,000 milli-big-blinds per game in poker parlance. That performance is four times worse than if the AI simply folded and gave up the pot at the start of every game.
DeepStack takes a very different approach that combines both old and new techniques. The older technique is an algorithm developed by University of Alberta researchers that previously helped come up with a solution for heads-up, limit Texas Hold’em (a simpler version of poker with restricted bet sizes). This counterfactual regret minimization algorithm, called CFR+ by its creators, comes up with the best possible play in a given situation by comparing different possible outcomes using game theory.
By itself, CFR+ would still run into the same problem of the computing bottleneck in trying to calculate all possible plays. But DeepStack gets around this by only having the CFR+ algorithm solve for a few moves ahead instead of all possible moves until the end of the game. For all the other possible moves, DeepStack turns to its own version of intuition that is equivalent to a “gut feeling” about the value of the hidden cards held by both poker players. To train DeepStack’s intuition, researchers turned to deep learning.
Deep learning enables AI to learn from example by filtering huge amounts of data through multiple layers of artificial neural networks. In this case, the DeepStack team trained their AI on the best solutions of the CFR+ algorithm for random poker situations. That allowed DeepStack’s intuition to become a “fast approximate estimate” of its best solution for the rest of the game without having to actually calculate all the possible moves.
“Deepstack presents the right marriage between imperfect information solvers and deep learning,” Bowling says.
But the success of the deep learning component surprised Bowling. He thought the challenge would prove too tough even for deep learning. His colleagues Martin Schmid and Matej Moravčík—both first authors on the DeepStack paper—were convinced that the deep learning approach would work. They ended up making a private bet on whether or not the approach would succeed. (“I owe them a beer,” Bowling says.)
DeepStack proved its poker-playing prowess in 44,852 games played against 33 poker pros recruited by the International Federation of Poker from 17 countries. Typically researchers would need to have their computer algorithms play a huge number of poker hands to ensure that the results are statistically significant and not simply due to chance. But the DeepStack team used a low-variance technique called AIVAT that filters out much of the chance factor and enabled them to come up with statistically significant results with as few as 3,000 games.
“We have a history in group of doing variance reduction techniques,” Bowling explains. “This new technique was pioneered in our work to help separate skill and luck.”
Of all the players, 11 poker pros completed the requested 3,000 games over a period of four weeks from November 7 to December 12, 2016. DeepStack handily beat 10 of the 11 with a statistically significant victory margin, and still technically beat the 11th player. DeepStack’s victory as analyzed by AIVAT was 486 milli-big-blinds per game (mbb/g). That’s quite a showing given that 50 mbb/g is considered a sizable margin of victory among poker pros. This victory margin also amounted to over 20 standard deviations from zero in statistical terms.
News of DeepStack’s success is just the latest blow to human poker-playing egos. A Carnegie Mellon University AI called Libratus achieved its statistically significant victory against four poker pros during a marathon tournament of 120,000 games total played in January 2017. That heavily publicized event led some online poker fans to fret about the possible death of the game at the hands of unbeatable poker bots. But to achieve victory, Libratus still calculated its main poker-playing strategy ahead of time based on abstracted game solving—a computer- and time-intensive process that required 15 million processor-core hours on a new supercomputer called Bridges.
Worried poker fans may have even greater cause for concern with the success of DeepStack. Unlike Libratus, DeepStack’s remarkably effective forward-looking intuition means it does not have to do any extra computing beforehand. Instead, it always looks forward by solving for actual possible plays several moves ahead and then relies on its intuition to approximate the rest of the game.
This “continual re-solving” approach that can take place at any given point in a game is a step beyond the “endgame solver” that Libratus used only during the last betting rounds of each game. And the fact that DeepStack’s approach works on the hardware equivalent of a gaming laptop could mean the world will see the rise of many more capable AI bots tackling a wide variety of challenges beyond poker in the near future.
“It does feel like a breakthrough of the sort that changes the types of problems we can apply this to,” Bowling says. “Most of the work of applying this to other problems becomes whether can we get a neural network to apply this to other situations, and I think we have experience with using deep learning in a whole variety of tasks.”