AI Agents Play "Hide the Toilet Plunger" to Learn Deep Concepts About Life

Training AI agents via gameplay could yield a more flexible and general intelligence

4 min read
By playing a game where they had to hide and search for objects, AI agents learned not just how to play the game, but also basic principles about the world.
By playing a game where they had to hide and search for objects, AI agents learned not just how to play the game, but also basic principles about the world.
Images: Allen Institute for AI

Most papers about artificial intelligence don’t cite Jean Piaget, the social scientist known for his groundbreaking studies of children’s cognitive development in the 1950s. But there he is, in a paper from the Allen Institute for AI (AI2). The researchers state that their AI agents learned the concept of object permanence—the understanding that an object hidden from view is still there—thus making those AI agents similar to a baby who just figured out the trick behind peekaboo. 

The researchers’ AI agents learned this precept and other rudimentary rules about the world by playing many games of hide and seek with objects, which took place within a simulated, but fairly realistic, house. The AI2 team calls the game “Cache,” but I prefer to call it “Hide the Toilet Plunger.” The agents also got to hide tomatoes, loaves of bread, cups, and knives.

The AI agents, which acted as both hiders and seekers, figured out the game via reinforcement learning. Starting out, they didn’t know anything about the 3D visual environment. They began by taking random actions like pulling on the handle of a drawer or pulling on an immovable wall, and they dropped their objects in all sorts of random places. The agents got better by playing against each other and learning from outcomes—if the seeker didn’t find the tomato, the hider knew it had chosen a good hiding place. 

The paper was recently accepted for the 2021 International Conference on Learning Representations, which takes place in May.

Unlike many projects concerning AI and gameplay, the point here wasn’t to create an AI super-player that could destroy puny humans. Rather, the researchers wanted to see if an AI agent could achieve a more generalized kind of visual intelligence if it learned about the world via gameplay.

“For us, the question was: Can it learn very basic things about objects and their attributes by interacting with them?” says Aniruddha Kembhavi, a research manager with AI2’s computer vision team and a paper coauthor.

This AI2 team is working on representation learning, in which AI systems are given some input—images, audio, text, etc.—and learn to categorize the data according to its features. In computer vision, for example, an AI system might learn the features that represent a cat or a traffic light. Ideally, though, it doesn’t learn only the categories, it also learns how to categorize data, making it useful even when given images of objects it has never before seen.

Visual representation learning has evolved over the past decade, Kembhavi explains. When deep learning took off, researchers first trained AI systems on databases of labeled images, such as the famous ImageNet. Because the labels enable the AI system to check its work, that technique is called supervised learning. “Then in past few years, the buzz has gone from supervised learning to self-supervised learning,” says Kembhavi, in which AI systems have to determine the labels for themselves. “We believe that an even more general way of doing it is gameplay—we just let the agents play around, and they figure it out.” 

The AI agent was tested on concepts that hadn't been explicitly taught, such as object permanence. The AI agents were tested on concepts that hadn't been explicitly taught, such as object permanence. Image: Allen Institute for AI

Once the AI2 agents had gotten good at the game, the researchers ran them through a variety of tests designed to test their understanding of the world. They first tested them on computer-generated images of rooms, asking them to predict traits such as depth of field and the geometry of objects. When compared to a model trained on the gold-standard ImageNet, the AI2 agents performed as well or better. They also tested them on photographs of real rooms; while they didn’t do as well as the ImageNet-trained model there, they did better than expected—an important indication that training in simulated environments could produce AI systems that function in the real world. 

The tests that really excited the researchers, though, were those inspired by developmental psychology. They wanted to determine whether the AI agents grasped certain “cognitive primitives,” or basic elements of understanding that can be built upon. They found that the agents understood the principles of containment, object permanence, and that they could rank images according to how much free space they contained. That ranking test was an attempt to get at a concept that Jean Piaget called seriation, or the ability to order objects based on a common property. 

If you’re thinking, “Haven’t I read something in IEEE Spectrum before about AI agents playing hide and seek?” you are not wrong, and you are also a faithful reader. In 2019, I covered an OpenAI project in which the hiders and seekers surprised the researchers by coming up with strategies that weren’t supposed to be possible in the game environment.  

Igor Mordatch, one of the OpenAI researchers behind that project, says he’s excited to see that AI2’s research doesn’t focus on external behaviors within the game, but rather the “internal representations of the world emerging in the minds of these agents,” he says in an email. “Representation learning is thought to be one of the key components to progress in general-purpose AI systems today, so any advances in this area would be highly impactful.”

As for transferring any advances from their research to the real world, the AI2 researchers say that the agents’ dynamic understanding of how objects act in time and space could someday be useful to robots. But they have no intention of doing robot experiments anytime soon. Training in simulation took several weeks; training in the real world would be infeasible. “Also, there’s a safety issue,” notes study coauthor Roozbeh Motaghi, also a research manager at AI2.“These agents do random stuff.” Just think of the havoc that could be wreaked on a lab by a rogue robot carrying a toilet plunger.

The Conversation (0)

Andrew Ng: Unbiggen AI

The AI pioneer says it’s time for smart-sized, “data-centric” solutions to big issues

10 min read
​Andrew Ng listens during the Power of Data: Sooner Than You Think global technology conference in Brooklyn, New York, on Wednesday, October 30, 2019.

Andrew Ng was involved in the rise of massive deep learning models trained on vast amounts of data, but now he’s preaching small-data solutions.

Cate Dingley/Bloomberg/Getty Images

Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an exclusive Q&A.

Keep Reading ↓ Show less