Robots Hallucinate Humans to Aid in Object Recognition - IEEE Spectrum

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Almost exactly a year ago, we posted about how Ashutosh Saxena's lab at Cornell was teaching robots to use their "imaginations" to try to picture how a human would want a room organized. The research was successful, with algorithms that used hallucinated humans (which are the best sort of humans) to influence the placement of objects performing significantly better than other methods. Cool stuff indeed, and now comes the next step: labeling 3D point-clouds obtained from RGB-D sensors by leveraging contextual hallucinated people.

A significant amount of research has been done investigating the relationships between objects and other objects. It's called semantic mapping, and it's very valuable in giving robots what we'd call things like "intuition" or "common sense." However, being humans, we tend to live human-centered lives, and that means that the majority of our stuff tends to be human-centered too, and keeping this in mind can help to put objects in context.

In the above case, a traditional semantic mapping algorithm might take a look at all of the objects on the desk and be able to figure out that its a desk area, but some of the objects (like the bottle of water or the jacket) don't necessarily fit into the "desk" semantic category. When you imagine a human there, though, it starts to make more sense, because clothing and water often show up where humans tend to spend significant amounts of time.

The other concept to deal with is that of object affordances. An affordance is some characteristic of an object that allows a human to do something with it. For example, a doorknob is an affordance that lets us open doors, and a handle on a coffee cup is an affordance that lets us pick it up and drink out of it. There's plenty to be learned about the function of an object by how a human uses it, but if you don't have a human handy to interact with the object for you, hallucinating one up out of nowhere can serve a similar purpose.

Here's a few videos of Cornell's PR2, Kodiak, demonstrating how this works:

We test our system on [a cushion]. We visualize the sampled human poses and object locations in red and blue heatmaps. We can see that most sampled humans are sitting on the couch, bean bag or chair, and the most likely location for the cushion is on the couch and the desk.

Our robot Kodiak (PR2) placing several objects in a fridge, on a table and on the ground, using hallucinated human skeletons and learned human-object relationships.

This research will be presented next week at RSS in Berlin, and also at CVPR in Portland. Because the researchers are good eggs, they've made their papers fully available ahead of time, and you can read them in full here and here.

[ Cornell ] via [ Txnologist ]

semantic mapping cornell robot ai pr2

Related Stories