Computer Battles Top Human Poker Players

A Carnegie Mellon computer program challenges human poker pros to 20,000 hands each over two weeks

Photo: Carnegie Mellon University
Advertisement

Computers have yet to fully master the popular version of poker known as no-limit Texas Hold ’em. That’s why humans still stand a chance in the world’s first serious tournament between some of the best professional poker players and a computer program developed by researchers at Carnegie Mellon University. Humans have the edge so far, but the competition remains tight going into the second and final week.

Four of the world’s top poker players have been playing 1,600 hands per day against the Carnegie Mellon University program, called Claudico, in live-streamed matches held at the Rivers Casino in Pittsburgh. The poker pros are competing for more than $100 000 in prize money donated by the casino and Microsoft Research. The prize for the researchers who organized the Brains vs. Artificial Intelligence tournament is the chance to see how well Claudico can perform against humans after its predecessor trounced digital rivals in a previous competition.

“There has definitely been a clash of styles,” says Tuomas Sandholm, a computer scientist at Carnegie Mellon University. “Humans play in a much more deterministic way, while Claudico plays much more randomized strategies,” observes Sandholm. “There are also other significant differences in the styles,” he added.

The human pro players, Doug Polk, Dong Kim, Bjorn Li and Jason Les, have held the upper hand by a slight margin as they continue playing toward a tournament total of 20,000 hands per person. Polk is generally considered the world’s best player of heads-up no-limit Texas Hold’em, a two-player version of poker that allows for unrestricted bet sizes.

Such unrestricted bet sizes make no-limit Texas Hold’em challenging even for supercomputer-powered programs. Computers must consider many more information states that each represent the possible moves made by opponents at each stage of the game—a difference of about 147 orders of magnitude between no-limit Texas hold’em and versions of the game in which bet sizes are limited. A poker-playing program developed by the University of Alberta in Canada previously came up with a “weak” solution for the heads-up limit version of Texas Hold’em, as detailed in the journal Science in January. 

No computer program has come close to solving no-limit Texas Hold’em. Claudico represents a supercharged version of a Carnegie Mellon University program, called Tartanian7, that beat all other challengers in the no-limit Texas Hold’em category of the Advancement of Artificial Intelligence’s Annual Computer Poker Competition last July.

Claudico does not use preprogrammed poker-playing strategies devised by human experts. Instead, it creates a simplified version the game that it can use to compute strategies on the fly as it plays against opponents. The algorithm that generates these strategies represents one of five software components that make up Claudico. (The other four Claudico software components run on a Carnegie Mellon University server called Steamroller.)

Calculating the possibilities, even for the simplified version, requires the capabilities of the Pittsburgh Supercomputing Center’s Blacklight supercomputer—a computing beast that has about 8,000 times as much random access memory as the most powerful tablets. Claudico uses the algorithm running on the Blacklight supercomputer to continuously develop better strategies over the course of its tournament play against the human poker pros.

Claudico takes its name from the Latin word for “limp.” Limping is a poker strategy that involves the first player calling the “big blind” bet by meeting its bet size instead of raising or folding. Poker pros tend to consider it a weak strategy, but Claudico has frequently used limping with a randomized approach that keeps opponents guessing.

Claudico’s randomized approach to playing poker represents one of its greatest strengths against its human opponents. The best human poker players usually play each hand by trying to figure out a “range” of possible cards in the pair held by their opponent and make educated guesses about “range” of predictions the opponent is making in return. Such a strategy can still make humans too predictable compared with the more randomized betting sizes and strategies of Claudico. In response, some of the poker pros have been trying to adopt randomization to some degree during the tournament.

“The top humans will adopt some limited amount of randomization successfully, but I doubt a human could ever adopt a fully optimized strategy based on randomization,” Sandholm said.

Humans also have to deal with the issue of endurance. The tournament started out on 24 April, with each human playing two 750-hand sessions per day. In the second half of the tournament, the poker pros have been playing 800-hand sessions. Play usually lasts from 11 AM until about 9 PM, with an hour and a half break between sessions in the mid-afternoon. 

Still, the human poker players have their own advantages. They’re quicker to pounce and exploit any weaknesses that they find in Claudico’s strategy, unlike poker-playing computer rivals that usually continue to deploy game-theory strategies without adapting quickly to opponents. Such adaptability has helped keep the humans ahead thus far.

The computer program still has a chance to whittle away the human players’ lead in the second half of the tournament this week and battle humanity to a statistical draw. But Sandholm and his Carnegie Mellon teamincluding grad students Noam Brown and Sam Ganzfried—remain focused on the tournament as an unprecedented experiment for testing Claudico’s capabilities. They paired up the poker pros to play duplicate matches against Claudico featuring the exact same hands. The researchers put one of the human players in a public area on the casino’s main floor and the other in a secure room with security guards monitoring the action.

Even a loss for Claudico in this tournament could still prove a win for researchers in the long run. Computer programs that become better at tackling complex games such as no-limit Texas Hold’em could also better handle the complexities of real-world challenges in security, business, and medicine, where there is rarely sufficient information to make a perfect decision.

“There are some things you cannot learn by playing computer opponents in the annual AI poker competition, because they don’t try to exploit you as much.” Sandholm said. “We’ve learned a lot about Claudico’s strengths and weaknesses by humans trying to exploit it all the time.”

You can watch the tournament live in the window below:

Watch live video from Claudico_vs_DougPolk on www.twitch.tv

The Tech Alert Newsletter

Receive latest technology science and technology news & analysis from IEEE Spectrum every Thursday.

About the Tech Talk blog

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.