DeepMind Teaches AI Teamwork

The U.S. women’s soccer team has been showing a commanding World Cup performance in France. What would it take for a group of robotic players to show such skill (besides agility and large batteries)? For one, teamwork. But coordination in even simple games has been difficult for artificial intelligence to learn without explicit programming. New research takes a step in the right direction, showing that when virtual players are rewarded for social influence, cooperation can emerge.

Humans are driven not just by extrinsic motivations—for money, food, or sex—but also by intrinsic ones—for knowledge, competence, and connection. Research shows that giving robots and machine-learning algorithms intrinsic motivations, such as a sense of curiosity, can boost their performance on various tasks. In the new work, presented last week at the International Conference on Machine Learning, AIs were given a “social” drive.

“This is a truly fascinating article with a huge potential for expansions,” says Christian Guckelsberger, a computer scientist at Queen Mary University of London who studies AI and intrinsic motivation but was not involved in the work.

The virtual creatures played two games in which they collectively navigate a two-dimensional world to gather apples. In Harvest, apples grow faster when more apples are nearby, so when they’re all gone, they stop appearing. Coordinated restraint is required. (In soccer, if everyone on your team runs toward the ball, you’ll lose.) In Cleanup, apples stop growing if a nearby aquifer isn’t continuously cleaned. (A team needs both offense and defense.)

The creatures relied on a form of AI called reinforcement learning, in which an algorithm uses trial and error, and gains rewards for better performance. In this work, each creature earned rewards not only for collecting apples, but also for altering the choices of other players—whether that helped or hurt the others.

In one experiment, the creatures estimated their influence using something like humans’ “theory of mind”—the ability to understand others’ thoughts. Through observation, they learned to predict the behavior of others. They could then predict what neighbors would do in response to one action versus another, using counterfactual or “what-if” reasoning. If a particular action would change their neighbors’ behaviors more than other possible actions, it was deemed more influential and thus more desirable.

Figure of constructed box experiment where agent can learn to act selfishly. In one version of a game for AI agents, DeepMind researchers constructed a special environment where the teal agent is trapped in a box. The purple agent could choose to release the teal agent or to simply gather its own apples.Illustration: Natasha Jaques/ Proceedings of Machine Learning Research

The researchers added up the number of apples gathered by all the critters. The population performed better when individuals were rewarded for influence than when they weren’t. They even outperformed unselfish populations in which creatures received extra rewards during training if inequality within the group (measured by the number of apples each critter collected) remained low. Apparently getting an intrinsic reward for others’ wellbeing will take coordination only so far, without counterfactual reasoning to tell you if your actions are directly responsible for others’ behavior.

Critters weren’t just nudging each other to or away from apples. The researchers found that the critters were using actions to send messages to each other, analogous to a “bee waggle dance.” In another experiment, researchers gave the creatures the additional ability to broadcast messages without moving. Again, the group scored higher when motivated to influence each other. What’s more, creatures that were easily influenced—good listeners—collected more apples than those that weren’t.

The research is “really neat,” says Ryan Lowe, a computer scientist at McGill University, in Montreal, who studies AI and coordination but was uninvolved in the work. Adding an impetus for influence is “kind of intuitive,” he says, “but sometimes intuitive things don’t work.”

In these experiments, selfish status-seeking led to cooperation, but in other situations it could potentially lead to harmful manipulation (by picture marketers, dictators, or bad friends). That’s why Natasha Jaques, a computer scientist at the Massachusetts Institute of Technology, in Cambridge, who spearheaded the work during an internship at Alphabet’s DeepMind, in London, wants to combine a drive for sway with one for largesse. Eventual applications could include autonomous vehicles, warehouse robots, or household helpers, she says: “Anything where you want a robot to coordinate with other robots or with humans.” In the meantime, she’s eager to try more complex games—including robot soccer.

software social media deepmind competition robots computing ai networks

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Vision 60 Quadruped Gets Arm Upgrade

Chiplet Boosts GPU Efficiency by 50%

Chess by Telegraph: A Surprising 1844 Innovation

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

DeepMind Teaches AI Teamwork

AIs that were given a “social” drive and rewarded for influence learned to cooperate

Vision 60 Quadruped Gets Arm Upgrade

Chiplet Boosts GPU Efficiency by 50%

Chess by Telegraph: A Surprising 1844 Innovation

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads