Now that DeepMind has taught AI to master the game of Go—and furthered its advantage in chess—they’ve turned their attention to another board game: Diplomacy. Unlike Go, it is seven-player, it requires a combination of competition and cooperation, and on each turn players make moves simultaneously, so they must reason about what others are reasoning about them, and so on.
“It’s a qualitatively different problem from something like Go or chess,” says Andrea Tacchetti, a computer scientist at DeepMind. In December, Tacchetti and collaborators presented a paper at the NeurIPS conference on their system, which advances the state of the art, and may point the way toward AI systems with real-world diplomatic skills—in negotiating with strategic or commercial partners or simply scheduling your next team meeting.
Diplomacy is a strategy game played on a map of Europe divided into 75 provinces. Players build and mobilize military units to occupy provinces until someone controls a majority of supply centers. Each turn, players write down their moves, which are then executed simultaneously. They can attack or defend against opposing players’ units, or support opposing players’ attacks and defenses, building alliances. In the full version, players can negotiate. DeepMind tackled the simpler No-Press Diplomacy, devoid of explicit communication.
Historically, AI has played Diplomacy using hand-crafted strategies. In 2019, the Montreal research institute Mila beat the field with a system using deep learning. They trained a neural network they called DipNet to imitate humans, based on a dataset of 150,000 human games. DeepMind started with a version of DipNet and refined it using reinforcement learning, a kind of trial-and-error.
Exploring the space of possibility purely through trial-and-error would pose problems, though. They calculated that a 20-move game can be played nearly 10868 ways—yes, that’s 10 with 868 zeroes after it.
So they tweaked their reinforcement-learning algorithm. During training, on each move, they sample likely moves of opponents, calculate the move that works best on average across these scenarios, then train their net to prefer this move. After training, it skips the sampling and just works from what its learning has taught it. “The message of our paper is: we can make reinforcement learning work in such an environment,” Tacchetti says. One of their AI players versus six DipNets won 30 percent of the time (with 14 percent being chance). One DipNet against six of theirs won only 3 percent of the time.
In April, Facebook will present a paper at the ICLR conference describing their own work on No-Press Diplomacy. They also built on a human-imitating network similar to DipNet. But instead of adding reinforcement learning, they added search—the techniques of taking extra time to plan ahead and reason about what every player is likely to do next. On each turn, SearchBot computes an equilibrium, a strategy for each player that the player can’t improve by switching only its own strategy. To do this, SearchBot evaluates each potential strategy for a player by playing the game out a few turns (assuming everyone chooses subsequent moves based on the net’s top choice). A strategy consists not of a single best move but a set of probabilities across 50 likely moves (suggested by the net), to avoid being too predictable to opponents.
Conducting such exploration during a real game slows SearchBot down, but allows it beat DipNet by an even greater margin than DeepMind’s system does. SearchBot also played anonymously against humans on a Diplomacy website and ranked in the top 2 percent of players. “This is the first bot that’s demonstrated to be competitive with humans,” says Adam Lerer, a computer scientist at Facebook and paper co-author.
“I think the most important point is that search is often underestimated,” Lerer says. One of his Facebook collaborators, Noam Brown, implemented search in a superhuman poker bot. Brown says the most surprising finding was that their method could find equilibria, a computationally difficult task.
“I was really happy when I saw their paper,” Tacchetti says, “because of just how different their ideas were to ours, which means that there’s so much stuff that we can try still.” Lerer sees a future in combining reinforcement learning and search, which worked well for DeepMind’s AlphaGo.
Both teams found that their systems were not easily exploitable. Facebook, for example, invited two top human players to each play 35 straight games against SearchBot, probing for weaknesses. The humans won only 6 percent of the time. Both groups also found that their systems didn’t just compete, but also cooperated, sometimes supporting opponents. “They get that in order to win, they have to work with others,” says Yoram Bachrach, from the DeepMind team.
That’s important, Bachrach, Lerer, and Tacchetti say, because games that combine competition and cooperation are much more realistic than purely competitive games like Go. Mixed motives occur in all realms of life: driving in traffic, negotiating contracts, and arranging times to Zoom.
How close are we to AI that can play Diplomacy with “press,” negotiating all the while using natural language?
“For Press Diplomacy, as well as other settings that mix cooperation and competition, you need progress,” Bachrach says, “in terms of theory of mind, how they can communicate with others about their preferences or goals or plans. And, one step further, you can look at the institutions of multiple agents that human society has. All of this work is super exciting, but these are early days.”