The system, called AlphaZero, began its life last year by beating a DeepMind system that had been specialized just for Go. That earlier system had itself made history by beating one of the world’s best Go players, but it needed human help to get through a months-long course of improvement. AlphaZero trained itself—in just 3 days.
AlphaZero, playing White against Stockfish, began by identifying four candidate moves. After 1,000 simulations, it rejected the moves marked in red; after another 100,000 simulations, it chose the move marked in green over the one marked in orange. AlphaZero went on to win, thanks in large part to having opened the diagonal for its bishop. Illustration: Science
The research, published today in the journal Science, was performed by a team led by DeepMind’s David Silver. The paper was accompanied by a commentary by Murray Campbell, an AI researcher at the IBM Thomas J. Watson Research Center in Yorktown Heights, N.Y.
“This work has, in effect, closed a multi-decade chapter in AI research,” writes Campbell, who was a member of the team that designed IBM’s Deep Blue, which in 1997 defeated Garry Kasparov, then the world chess champion. “AI researchers need to look to a new generation of games to provide the next set of challenges.”
AlphaZero can crack any game that provides all the information that’s relevant to decision-making; the new generation of games to which Campbell alludes do not. Poker furnishes a good example of such games of “imperfect” information: Players can hold their cards close to their chests. Other examples include many multiplayer games, such as StarCraft II, Dota, and Minecraft. But they may not pose a worthy challenge for long.
“Those multiplayer games are harder than Go, but not that much higher,” Campbell tells IEEE Spectrum. “A group has already beaten the best players at Dota 2, though it was a restricted version of the game; Starcraft may be a little harder. I think both games are within 2 to 3 years of solution.”
He calls multiplayer games a “good interim step,” adding that any game that includes language would open up still greater realms of complexity. IBM famously tackled a television trivia game with its machine Watson, which won at Jeopardy in 2011. Watson later showed its mettle in academic debate. However, IBM is still working to adapt the system for use in healthcare.
AlphaZero is amazing in the sheer power it brings to game-playing. And this says much, given the extraordinary progress the old-fashioned methods had already made.
Deep Blue was a monster of a machine built solely to play chess, and its 1997 victory over Kasparov was not overwhelming. Today, though, even a smartphone can outplay Magnus Carlsen, the reigning world champion, and do so again and again:
But that smartphone is just a piker compared to the top conventionally programmed chess program, Stockfish. And Stockfish, in turn, is a piker next to AlphaZero, which crushed it after a mere 24 hours of self-training.
DeepMind developed the self-training method, called deep reinforcement learning, specifically to attack Go. Today’s announcement that they’ve generalized it to other games means they were able to find tricks to preserve its playing strength after giving up certain advantages peculiar to playing Go. The biggest such advantage was the symmetry of the Go board, which allowed the specialized machine to calculate more possibilities by treating many of them as mirror images.
It was surprisingly easy to generalize the Go-playing machine. “They didn’t have to do much of anything,” marvels Campbell. “Instead of having a Go board as input and the Go rules directing the search, they said, ‘let’s have chessboard and chess rules.’ There was actually a significant debate over whether the approach would work for chess.”
The researchers have so far unleashed their creation only on Go, chess and Shogi, a Japanese form of chess. Go and Shogi are astronomically complex, and that’s why both games long resisted the “brute-force” algorithms that the IBM team used against Kasparov two decades ago.
Chess, however, had been the preferred test bed for AI for more than a lifetime, figuring in the research of such pioneers as Alan Turing, Claude Shannon, and Herbert Simon. The game appealed because it certainly seemed to involve thinking and because it was neither too hard (like poker) nor too easy (like checkers). Even so, chess turned out to be a hard nut to crack.
In 1957 Simon famously predicted that a machine would outplay the world chess champion “within 10 years,” and later he was gently mocked for being decades off the mark. But he complained that critics of AI dismissed all new advances as mere parlor tricks.
”That's because they define thinking as that which computers can't yet do,” Simon told me, back in 1998. “They keep raising the bar.” He died three years later, but at least he lived to see Deep Blue’s victory over Kasparov.
Problems in life rarely come with all the information needed for their solution. That’s why an AI that can master any game of imperfect information might find application way beyond gaming, say in financial modeling, even war. A self-driving car equipped with such an AI might finally conquer the roads, producing wild success for whichever company first perfects the idea.
Maybe it’ll be Waymo, a branch of Alphabet and thus a sibling to DeepMind.
Updated 6 December 2018
Philip E. Ross is a senior editor at IEEE Spectrum. His interests include transportation, energy storage, AI, and the economic aspects of technology. He has a master's degree in international affairs from Columbia University and another, in journalism, from the University of Michigan.