Google AI Learns Classic Arcade Games From Scratch, Would Probably Beat You at Them

Illustration: Google DeepMind

New artificial intelligence software from Google can teach itself how to play—and often master—classic 1980s Atari arcade games.

"This work is the first time that anyone has built a single general-learning system that can learn directly from experience to master a wide range of challenging task—in this case, a set of Atari games—and perform at or better than human level at those games," says one of the AI’s creators Demis Hassabis, who works at Google DeepMind in London. Hassabis and colleagues detailed their findings in in this week’s issue of the journal Nature. (And you can download the source code from Google here.)

The researchers hope to apply the ideas behind their AI to Google products such as search, machine translation, and smartphone apps "to make those things smarter," Hassabis says.

Artificial intelligence is now experiencing a renaissance because of groundbreaking advances in machine learning. One important machine learning strategy is reinforcement learning, in which a program known as an agent learns through trial and error what actions maximize a future reward.

However, reinforcement learning agents often have problems dealing with data that approach real-world complexity. To improve such agents, researchers combined reinforcement learning with a technique known as convolutional neural networks, which are hotly pursued under the name “deep learning” by tech giants such as Google, Facebook, Apple. (The original developer of convolutional networks, Facebook AI chief Yann LeCun, explains deep learning here.)

In an artificial neural network, components known as artificial neurons are fed data, and work together to solve a problem such as reading handwriting or recognizing speech. The network can then alter the pattern of connections among those neurons to change the way they interact, and the network tries solving the problem again. Over time, the network learns which patterns are best at computing solutions.

Such learning systems differ from other game-playing systems such as Deep Blue’s chess software and Watson’s Jeopardy program, explains Hassabis: 

Those systems are very impressive technical feats—obviously, they beat the human world champions in both those games. The key difference in those kinds of algorithms and systems is that they were largely preprogrammed with those abilities. Take Deep Blue—it was a team of programmers and chess grandmasters that distilled chess knowledge into the program, and then that program efficiently executed that task without adapting or learning anything.

What we've done is developed an algorithm that learns from the ground up. It takes perceptual experiences and learns how to do things directly from those perceptual experiences from first principles. The advantage of these kinds of systems is that they can learn and adapt to unexpected things, and the programmers and system designers don't have to know the solution themselves in order for the machine to master that task.

The new software agent, called a deep Q-network (DQN), was tested on 49 classic Atari 2600 games, including Space Invaders, Ms. Pac-Man, Pong, Asteroids, Centipede, Q*bert, and Breakout. The agent was only fed the scores and data from an 84 by 84 pixel screen—unlike some other general game-playing AIs, the DQN did not know the rules of the games it played beforehand.

The system ran on a single GPU-equipped desktop computer and trained for about two weeks per game. The DQN performed at a level comparable to that of a professional human games tester, achieving more than 75 percent of what the human tester scored on 29 games. The agent also outperformed the best existing reinforcement learning agents on 43 games.

The nature of the games at which the DQN excelled were highly varied in nature, including side-scrolling shooters, 3-D car-racing, and boxing. "This system is able to generalize to any sequential decision-making decision," says Koray Kavukcuoglu at Google DeepMind.

The games where the DQN did not do well reflect the limitations of the agent. "Currently, the system learns essentially by pressing keys randomly and then figuring out when this leads to high scores," Google DeepMind’s Vlad Mnih. However, such a button-mashing strategy often does not work in games requiring more sophisticated exploration or long-term planning.

The researchers are now moving on to games from the 1990s, which include some 3-D  “where the challenge is much greater,” Hassabis says. “StarCraft and Civilization are the ones we plan to crack at some point.”

So, will it be “Today, Ms. Pac-man; tomorrow, the world”? No, says Hassabis, noting that the AI-concerned entrepreneur Elon Musk was an early investor in DeepMind, which was later acquired by Google. "I'm good friends with Elon," says Hassabis. "We agree with him that there are risks, but we're many many decades away from any kind of technology we need to worry about."


Tech Talk

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.

Newsletter Sign Up

Sign up for the Tech Alert newsletter and receive ground-breaking technology and science news from IEEE Spectrum every Thursday.