AI Teaches Itself Diplomacy

Now that DeepMind has taught AI to master the game of Go—and furthered its advantage in chess—they’ve turned their attention to another board game: Diplomacy. Unlike Go, it is seven-player, it requires a combination of competition and cooperation, and on each turn players make moves simultaneously, so they must reason about what others are reasoning about them, and so on.

“It’s a qualitatively different problem from something like Go or chess,” says Andrea Tacchetti, a computer scientist at DeepMind. In December, Tacchetti and collaborators presented a paper at the NeurIPS conference on their system, which advances the state of the art, and may point the way toward AI systems with real-world diplomatic skills—in negotiating with strategic or commercial partners or simply scheduling your next team meeting.

Diplomacy is a strategy game played on a map of Europe divided into 75 provinces. Players build and mobilize military units to occupy provinces until someone controls a majority of supply centers. Each turn, players write down their moves, which are then executed simultaneously. They can attack or defend against opposing players’ units, or support opposing players’ attacks and defenses, building alliances. In the full version, players can negotiate. DeepMind tackled the simpler No-Press Diplomacy, devoid of explicit communication.

Historically, AI has played Diplomacy using hand-crafted strategies. In 2019, the Montreal research institute Mila beat the field with a system using deep learning. They trained a neural network they called DipNet to imitate humans, based on a dataset of 150,000 human games. DeepMind started with a version of DipNet and refined it using reinforcement learning, a kind of trial-and-error.

Exploring the space of possibility purely through trial-and-error would pose problems, though. They calculated that a 20-move game can be played nearly 10⁸⁶⁸ ways—yes, that’s 10 with 868 zeroes after it.

So they tweaked their reinforcement-learning algorithm. During training, on each move, they sample likely moves of opponents, calculate the move that works best on average across these scenarios, then train their net to prefer this move. After training, it skips the sampling and just works from what its learning has taught it. “The message of our paper is: we can make reinforcement learning work in such an environment,” Tacchetti says. One of their AI players versus six DipNets won 30 percent of the time (with 14 percent being chance). One DipNet against six of theirs won only 3 percent of the time.

In April, Facebook will present a paper at the ICLR conference describing their own work on No-Press Diplomacy. They also built on a human-imitating network similar to DipNet. But instead of adding reinforcement learning, they added search—the techniques of taking extra time to plan ahead and reason about what every player is likely to do next. On each turn, SearchBot computes an equilibrium, a strategy for each player that the player can’t improve by switching only its own strategy. To do this, SearchBot evaluates each potential strategy for a player by playing the game out a few turns (assuming everyone chooses subsequent moves based on the net’s top choice). A strategy consists not of a single best move but a set of probabilities across 50 likely moves (suggested by the net), to avoid being too predictable to opponents.

Conducting such exploration during a real game slows SearchBot down, but allows it beat DipNet by an even greater margin than DeepMind’s system does. SearchBot also played anonymously against humans on a Diplomacy website and ranked in the top 2 percent of players. “This is the first bot that’s demonstrated to be competitive with humans,” says Adam Lerer, a computer scientist at Facebook and paper co-author.

“I think the most important point is that search is often underestimated,” Lerer says. One of his Facebook collaborators, Noam Brown, implemented search in a superhuman poker bot. Brown says the most surprising finding was that their method could find equilibria, a computationally difficult task.

“I was really happy when I saw their paper,” Tacchetti says, “because of just how different their ideas were to ours, which means that there’s so much stuff that we can try still.” Lerer sees a future in combining reinforcement learning and search, which worked well for DeepMind’s AlphaGo.

Both teams found that their systems were not easily exploitable. Facebook, for example, invited two top human players to each play 35 straight games against SearchBot, probing for weaknesses. The humans won only 6 percent of the time. Both groups also found that their systems didn’t just compete, but also cooperated, sometimes supporting opponents. “They get that in order to win, they have to work with others,” says Yoram Bachrach, from the DeepMind team.

That’s important, Bachrach, Lerer, and Tacchetti say, because games that combine competition and cooperation are much more realistic than purely competitive games like Go. Mixed motives occur in all realms of life: driving in traffic, negotiating contracts, and arranging times to Zoom.

How close are we to AI that can play Diplomacy with “press,” negotiating all the while using natural language?

“For Press Diplomacy, as well as other settings that mix cooperation and competition, you need progress,” Bachrach says, “in terms of theory of mind, how they can communicate with others about their preferences or goals or plans. And, one step further, you can look at the institutions of multiple agents that human society has. All of this work is super exciting, but these are early days.”

robot ai software gaming DeepMind AI

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

AI Teaches Itself Diplomacy

Another classic game deepens a skillset with broad applications—diplomatic savvy often needed in a pinch, crunch, or imbroglio

German EV Motor Could Break Supply-Chain Deadlock

E-Bikes Are Growing Up, Finding Jobs, Still Having Fun

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

AI Teaches Itself Diplomacy

Another classic game deepens a skillset with broad applications—diplomatic savvy often needed in a pinch, crunch, or imbroglio

15 Graphs That Explain the State of AI in 2024

German EV Motor Could Break Supply-Chain Deadlock

E-Bikes Are Growing Up, Finding Jobs, Still Having Fun

Related Stories

Two Natural-Language AI Algorithms Walk Into A Bar...

Are Digital Humans the Next Step in Human-Computer Interaction?

To Learn To Deal With Uncertainty, This AI Plays Pong