Last week, we (and most of the rest of the internet) covered some research from MIT that uses a brain interface to help robots correct themselves when they’re about to make a mistake. This is very cool, very futuristic stuff, but it only works if you wear a very, very silly hat that can classify your brain waves in 10 milliseconds flat.
At Brown University, researchers in Stefanie Tellex’s lab are working on a more social approach to helping robots more accurately interact with humans. By enabling a robot to model its own confusion in an interactive object-fetching task, the robot can ask relevant clarifying questions when necessary to help understand exactly what humans want. No hats required.
Whether you ask a human or a robot to fetch you an object, it’s a simple task to perform if the object is unique in some way, and a more complicated task to perform if it involves several similar objects. Say you’re a mechanic, and you want an assistant to bring you a tool. You can point at a shelf of tools and say, “Bring me that tool.” Your assistant, if they’re human, will look where you point, and if there are only a handful tools on the shelf, they’ll probably be able to infer what tool you mean. But if the shelf is mostly full, especially if it’s full of objects that are similar, your assistant might not be able to determine exactly what tool you’re talking about, so they’ll ask you to clarify somehow, perhaps by pointing at a tool and saying, “Is this the one you mean?”
To be useful in situations like these, your assistant has to have an understanding of ambiguity and uncertainty: They have to be able to tell when there is (or is not) enough information to complete a task, and then take the right action to help get more information when necessary, whether that uncertainty comes from the assistant not really getting something, or just from you not being specific enough about what you want. For robot assistants, it’s a much more difficult problem than it is for human assistants, because of the social components involved. Pointing, gestures, gaze, and language cues are all tricks that humans use to communicate information that robots are generally quite terrible at interpreting.
At Brown, they’ve created a painfully acronym’d system called “FEedback To Collaborative Handoff Partially Observable Markov Decision Process,” or FETCH-POMDP. It’s able to understand the implicit meaning of common gestures, and merge those meanings with what the person making the gestures is saying to improve its understanding of what the person wants. Assuming that the person is being cooperative (not lying about what they want), the system is able to model its own confusion and ask questions only when necessary for accuracy so as not to be unduly bothersome.
To test out the FETCH-POMDP system, the Brown researchers asked people who had no idea what was going on to ask a Baxter robot to fetch things for them. A third of the time, the robot asked no questions; a third of the time, the robot always asked clarifying questions; and the final third of the time, the robot only asked questions when it decided that a question was necessary. The researchers expected that the robot would be fastest when asking no questions, and most accurate when always asking questions, but it turned out that the intelligent questioning approach managed to be both the fastest and most accurate. This is because human robot interaction is messy: People asking questions led to transcription errors (confusing “yes” with “hand,” for example), so more questions meant more misunderstandings.
Interestingly, the participants in the trials also ascribed all kinds of capabilities to the robot which it didn’t actually have:
During trials, many users used prepositional phrases in order to describe items, such as “Hand me the spoon to the left of the bowl.” Although the language model in this work did not account for referential phrases, the agent was able to use intelligent social feedback to figure out what the human desired. This may explain why many users reported that they thought the robot did understand prepositional phrases. Methods exist to interpret referential language, but problems in understanding will still occur. Our model will help correct those mistakes, regardless of the exact method of state estimation and language understanding.
The researchers do plan to update their model to include referential phrasing like this, and they also want to add things like eye tracking to improve accuracy even more. Adding the ability to place (along with pick) has the potential to make this system much more workplace useful, especially if you have a spaceship that needs fixing.
“Reducing Errors in Object-Fetching Interactions through Social Feedback,” by David Whitney, Eric Rosen, James MacGlashan, Lawson L.S. Wong, and Stefanie Tellex from Brown University will be presented at ICRA 2017 in Singapore.
[ Brown ]