The December 2022 issue of IEEE Spectrum is here!

Close bar

DeepMind Teaches AI Teamwork

AIs that were given a “social” drive and rewarded for influence learned to cooperate

3 min read
Illustration of a group of robots surrounding a robot with a soccer ball.
Researchers are trying to develop AIs that collaborate with humans and with one another. Such cooperative agents could someday appear in self-driving cars or household robots.
Illustration: Shutterstock

The U.S. women’s soccer team has been showing a commanding World Cup performance in France. What would it take for a group of robotic players to show such skill (besides agility and large batteries)? For one, teamwork. But coordination in even simple games has been difficult for artificial intelligence to learn without explicit programming. New research takes a step in the right direction, showing that when virtual players are rewarded for social influence, cooperation can emerge.

Humans are driven not just by extrinsic motivations—for money, food, or sex—but also by intrinsic ones—for knowledge, competence, and connection. Research shows that giving robots and machine-learning algorithms intrinsic motivations, such as a sense of curiosity, can boost their performance on various tasks. In the new work, presented last week at the International Conference on Machine Learning, AIs were given a “social” drive.

“This is a truly fascinating article with a huge potential for expansions,” says Christian Guckelsberger, a computer scientist at Queen Mary University of London who studies AI and intrinsic motivation but was not involved in the work.

The virtual creatures played two games in which they collectively navigate a two-dimensional world to gather apples. In Harvest, apples grow faster when more apples are nearby, so when they’re all gone, they stop appearing. Coordinated restraint is required. (In soccer, if everyone on your team runs toward the ball, you’ll lose.) In Cleanup, apples stop growing if a nearby aquifer isn’t continuously cleaned. (A team needs both offense and defense.)

The creatures relied on a form of AI called reinforcement learning, in which an algorithm uses trial and error, and gains rewards for better performance. In this work, each creature earned rewards not only for collecting apples, but also for altering the choices of other players—whether that helped or hurt the others.

In one experiment, the creatures estimated their influence using something like humans’ “theory of mind”—the ability to understand others’ thoughts. Through observation, they learned to predict the behavior of others. They could then predict what neighbors would do in response to one action versus another, using counterfactual or “what-if” reasoning. If a particular action would change their neighbors’ behaviors more than other possible actions, it was deemed more influential and thus more desirable.

Figure of constructed box experiment where agent can learn to act selfishly. In one version of a game for AI agents, DeepMind researchers constructed a special environment where the teal agent is trapped in a box. The purple agent could choose to release the teal agent or to simply gather its own apples.Illustration: Natasha Jaques/ Proceedings of Machine Learning Research

The researchers added up the number of apples gathered by all the critters. The population performed better when individuals were rewarded for influence than when they weren’t. They even outperformed unselfish populations in which creatures received extra rewards during training if inequality within the group (measured by the number of apples each critter collected) remained low. Apparently getting an intrinsic reward for others’ wellbeing will take coordination only so far, without counterfactual reasoning to tell you if your actions are directly responsible for others’ behavior.

Critters weren’t just nudging each other to or away from apples. The researchers found that the critters were using actions to send messages to each other, analogous to a “bee waggle dance.” In another experiment, researchers gave the creatures the additional ability to broadcast messages without moving. Again, the group scored higher when motivated to influence each other. What’s more, creatures that were easily influenced—good listeners—collected more apples than those that weren’t.

The research is “really neat,” says Ryan Lowe, a computer scientist at McGill University, in Montreal, who studies AI and coordination but was uninvolved in the work. Adding an impetus for influence is “kind of intuitive,” he says, “but sometimes intuitive things don’t work.”

In these experiments, selfish status-seeking led to cooperation, but in other situations it could potentially lead to harmful manipulation (by picture marketers, dictators, or bad friends). That’s why Natasha Jaques, a computer scientist at the Massachusetts Institute of Technology, in Cambridge, who spearheaded the work during an internship at Alphabet’s DeepMind, in London, wants to combine a drive for sway with one for largesse. Eventual applications could include autonomous vehicles, warehouse robots, or household helpers, she says: “Anything where you want a robot to coordinate with other robots or with humans.” In the meantime, she’s eager to try more complex games—including robot soccer.

The Conversation (0)

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
A plate of spaghetti made from code
Shira Inbar

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less