DeepMind Deploys Self-taught Agents To Beat Humans at Quake III

Chess and Go were originally developed to mimic warfare, but they do a bad job of it. War and most other competitions generally involve more than one opponent and more than one ally, and the play typically unfolds not on an orderly, flat matrix but in a variety of landscapes built up in three dimensions.

That’s why Alphabet’s DeepMind, having crushed chess and Go, has now tackled the far harder challenge posed by the three-dimensional, multiplayer, first-person video game. Writing today in Science, lead author Max Jaderberg and 17 DeepMind colleagues describe how a totally unsupervised program of self-learning allowed software to exceed human performance in playing “Quake III Arena.” The experiment involved a version of the game that requires each of two teams to capture as many of the other teams’ flags as possible.

The teams begin at base camps set at opposite ends of a map, which is generated at random before each round. Players roam about, interacting with buildings, trees, hallways and other features on the map, as well as with allies and opponents. They try to use their laser-like weapons to “tag” members of the opposing team; a tagged player must drop any flag he might have been carrying on the spot and return to his team’s base.

DeepMind represents each player with a software agent that sees the same screen a human player would see. The agents have no way of knowing what other agents are seeing; again, this is a much closer approximation of real strategic contests than most board games provide. Each agent begins by making choices at random, but as evidence trickles in over successive iterations of the game, it is used in a process called reinforcement learning. The result is to cause the agent’s behavior to converge on a purposeful behavior pattern, called a “policy.”

Each agent develops its policy on its own, which means it can specialize a bit. However, there’s a limit: After every 1000 iterations of play the system compares policies and estimates how well the entire team would do if it were to mimic this or that agent. If one agent’s winning chances turn out to be less than 70 percent as high as another’s, the weaker agent copies the stronger one. Meanwhile, the reinforcement learning is itself tweaked by comparing it to other metrics. Such tweaking of the tweaker is known as meta-optimization.

Agents start out as blank slates, but they do have one feature built into their way of evaluating things. It’s called a multi–time scale recurrent neural network with external memory, and it keeps an eye not only on the score at the end of the game but also at earlier points. The researchers note that “Reward purely based on game outcome, such as win/draw/loss signal...is very sparse and delayed, resulting in no learning. Hence, we obtain more frequent rewards by considering the game points stream.”

The program generally beats human players when starting from a randomly generated position. Even after the humans had practiced for a total of 12 hours, they still were able to win just 25 percent of the games, drawing 6 percent of the time, and losing the rest.

However, when two professional game testers were given a particularly complex map that had not been used in training and were allowed to play games on that map against two software agents, the pros needed just 6 hours of training to come out on top. This result was not described in the Science paper but in a supplementary document made available to the press. The pros used their in-depth study of the map to identify the routes that the agents preferred and to work out how to avoid those routes.

So for the time being people can still beat software in a well-studied set-piece battle. Of course, real life rarely provides such opportunities. Robert E. Lee got to fight the Battle of Gettysburg just one time.

3d deepmind neural network reinforcement learning ai gaming multiplayer game machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

DeepMind Deploys Self-taught Agents To Beat Humans at Quake III

Without instructions, software agents learn how to crush human players at “Capture the Flag” in Quake III Arena

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

DeepMind Deploys Self-taught Agents To Beat Humans at Quake III

Without instructions, software agents learn how to crush human players at “Capture the Flag” in Quake III Arena

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024