Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

AI2-THOR Interactive Simulation Teaches AI About Real World

AI2-THOR, an interactive simulation based on home environments, can prepare AI for real-world challenges

4 min read

Virtual Kitchen form the AI2-THOR
Image: Roozbeh Mottaghi, Eric Kolve / Allen Institute for Artificial Intelligence

Training a robot butler to make the perfect omelette could require breaking a lot of eggs and throwing out many imperfect attempts in a real-life kitchen.

That’s why researchers have been rolling out virtual training grounds as a more efficient alternative to putting AI agents through costly and time-consuming experiments in the real world.

Virtual environments could prove especially useful in training the most popular AI based on machine learning algorithms that often require thousands of trial-and-error runs to learn new skills. Companies such as Waymo have already built their own internal simulators with virtual roads and traffic intersections to train their AI to safely take the wheel of self-driving cars. But a new, open-source virtual training ground called AI2-THOR enables AI agents to learn how to interact with objects in familiar home settings such as kitchens and bedrooms.

“These trials in the real world might damage the objects that the robots interact with or even the robot itself,” says Roozbeh Mottaghi, a research scientist in the computer vision team at the Allen Institute for Artificial Intelligence in Seattle. “So it is much safer and cost-effective to first train the models in the virtual environment.”

Interactivity may prove especially crucial for training the coming generations of AI that will be expected to chauffeur humans around in self-driving cars or perform chores in hotels and households. Most commercial machine learning algorithms and the more specialized deep neural networks have gained visual intelligence through training on fairly passive datasets such as images and videos. But AI that can train through interactions with virtual simulations could potentially improve their visual intelligence by leaps and bounds.

Since the summer of 2016, Mottaghi and his colleagues have been developing THOR (The House Of inteRactions) as an interactive and photorealistic 3D environment based on the real world. The more realistic the simulation, they reasoned, the more likely that the skills AI agents learn in the virtual world could carry over seamlessly to the real world.

The first version of AI2-THOR [PDF] released in December 2017 includes 120 scenes based on four room categories: kitchens, living rooms, bedrooms, and bathrooms. It also features interactive objects—such as microwaves that can open and close—and realistic physics that model how an AI agent might bump into a couch or knock over a chair. That means AI agents can practice handling virtual objects and changing their states in a way that more closely mimics real-world interactions.

Virtual Living Room form the AI2-THORA virtual living room in the AI2-THOR simulation.Roozbeh Mottaghi, Eric Kolve / Allen Institute for Artificial Intelligence

Many previous simulations developed for training AI have taken their cue from commercial video games such as the first-person shooter “Doom” or the open-world driving game “Grand Theft Auto.” For example, Google’s independent DeepMind team used the Quake III Arena game engine to build a customizable virtual environment for training AI agents. AI2-THOR itself is built on the Unity game engine.

But until the debut of AI2-THOR, almost none of these simulations have managed to deliver photorealism based on the real world along with realistic physics and actionable objects. “The tasks that are learned through games are usually very different from what we do daily, and adapting an agent that is trained in a game to do everyday tasks is very difficult,” Mottaghi says.

It was no easy task to build a realistic virtual world with interactive objects. In the case of a simple microwave, researchers had to design the logic around all the possible states of the microwave being open or closed. That problem got even more challenging for complicated object interactions such as simply making a sandwich in the kitchen.

“[C]hecking whether an object fits inside the microwave or not was not trivial,” Mottaghi explains. “Smooth animation of the door while obeying the physics laws was another issue that we had to deal with in our design.”

Still, the first version of AI2-THOR includes an intriguing variety of actionable objects such as sliceable apples, empty or filled bathtubs, made (or unmade) beds, and raw or fried eggs. A separate class of receptacle objects such as the microwaves, boxes, tabletops and toilet paper hangers can hold certain objects.

The team hopes to continue adding features such as non-rigid body physics that would enable AI agents to practice folding a sheet or manipulating a pillow. They also want to enable agents to communicate with each other for the purpose of cooperating on tasks such as moving a sofa from one side of the room to the other.

AI2-THOR has already been released as an open-source platform that anyone with the technical savvy can download and customize the first version to add new objects, scenes and possible interactions to suit their research needs. Some groups have already used AI2-THOR to conduct psychology experiments in a virtual reality setting, set up dialogue between AI agents, or study the physics of objects.

Mottaghi and his colleagues even envision integrating their framework with the Amazon Mechanical Turk crowdsourcing service, which would allow them to “collect labeled data from hundreds of users that interact with our environments.” Such datasets could help the team develop and fine-tune the THOR environment’s interactions much faster.

Realistic virtual training grounds such as AI2-THOR may even help AI take the next big step by enabling a style of learning more similar to how humans learn in the real world, Mottaghi says. At the same time, the virtual setting will also provide far greater flexibility in developing generalizable AI models and be far more forgiving in terms of mistakes. After all, there’s always the reset button.

The Conversation (0)