DeepMind Teaches AI Teamwork

AIs that were given a “social” drive and rewarded for influence learned to cooperate

Illustration of a group of robots surrounding a robot with a soccer ball.
Illustration: Shutterstock
Researchers are trying to develop AIs that collaborate with humans and with one another. Such cooperative agents could someday appear in self-driving cars or household robots.
Advertisement

The U.S. women’s soccer team has been showing a commanding World Cup performance in France. What would it take for a group of robotic players to show such skill (besides agility and large batteries)? For one, teamwork. But coordination in even simple games has been difficult for artificial intelligence to learn without explicit programming. New research takes a step in the right direction, showing that when virtual players are rewarded for social influence, cooperation can emerge.

Humans are driven not just by extrinsic motivations—for money, food, or sex—but also by intrinsic ones—for knowledge, competence, and connection. Research shows that giving robots and machine-learning algorithms intrinsic motivations, such as a sense of curiosity, can boost their performance on various tasks. In the new work, presented last week at the International Conference on Machine Learning, AIs were given a “social” drive.

“This is a truly fascinating article with a huge potential for expansions,” says Christian Guckelsberger, a computer scientist at Queen Mary University of London who studies AI and intrinsic motivation but was not involved in the work.

The virtual creatures played two games in which they collectively navigate a two-dimensional world to gather apples. In Harvest, apples grow faster when more apples are nearby, so when they’re all gone, they stop appearing. Coordinated restraint is required. (In soccer, if everyone on your team runs toward the ball, you’ll lose.) In Cleanup, apples stop growing if a nearby aquifer isn’t continuously cleaned. (A team needs both offense and defense.)

The creatures relied on a form of AI called reinforcement learning, in which an algorithm uses trial and error, and gains rewards for better performance. In this work, each creature earned rewards not only for collecting apples, but also for altering the choices of other players—whether that helped or hurt the others.

In one experiment, the creatures estimated their influence using something like humans’ “theory of mind”—the ability to understand others’ thoughts. Through observation, they learned to predict the behavior of others. They could then predict what neighbors would do in response to one action versus another, using counterfactual or “what-if” reasoning. If a particular action would change their neighbors’ behaviors more than other possible actions, it was deemed more influential and thus more desirable.

The researchers added up the number of apples gathered by all the critters. The population performed better when individuals were rewarded for influence than when they weren’t. They even outperformed unselfish populations in which creatures received extra rewards during training if inequality within the group (measured by the number of apples each critter collected) remained low. Apparently getting an intrinsic reward for others’ wellbeing will take coordination only so far, without counterfactual reasoning to tell you if your actions are directly responsible for others’ behavior.

Critters weren’t just nudging each other to or away from apples. The researchers found that the critters were using actions to send messages to each other, analogous to a “bee waggle dance.” In another experiment, researchers gave the creatures the additional ability to broadcast messages without moving. Again, the group scored higher when motivated to influence each other. What’s more, creatures that were easily influenced—good listeners—collected more apples than those that weren’t.

The research is “really neat,” says Ryan Lowe, a computer scientist at McGill University, in Montreal, who studies AI and coordination but was uninvolved in the work. Adding an impetus for influence is “kind of intuitive,” he says, “but sometimes intuitive things don’t work.”

In these experiments, selfish status-seeking led to cooperation, but in other situations it could potentially lead to harmful manipulation (by picture marketers, dictators, or bad friends). That’s why Natasha Jaques, a computer scientist at the Massachusetts Institute of Technology, in Cambridge, who spearheaded the work during an internship at Alphabet’s DeepMind, in London, wants to combine a drive for sway with one for largesse. Eventual applications could include autonomous vehicles, warehouse robots, or household helpers, she says: “Anything where you want a robot to coordinate with other robots or with humans.” In the meantime, she’s eager to try more complex games—including robot soccer.

The Tech Alert Newsletter

Receive latest technology science and technology news & analysis from IEEE Spectrum every Thursday.

About the Tech Talk blog

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.