Most robots are designed to do work. As such, not a lot of time, effort, or money is spent on making them able to communicate with humans, because they’re usually just doing their own thing. This is starting to change a bit, though, as robots become versatile enough that it’s reasonable to have humans working with them more directly, and it’s becoming more important that those humans have some idea what the robot is up to.
Some robots manage this with sounds, or lights, or screens with faces on them, but there are many systems for which hardware modifications like that aren’t a good option. In one of best papers from all of HRI 2018 (seriously, they won a best paper award), roboticists from the University of Colorado Boulder explore how using augmented reality to help robots communicate with humans can make the bots feel safer, more efficient, and more part of a collaborative team.
When watching a drone (or many other kinds of robots), it’s not at all obvious what they’re going to do next. It’s frequently not at all obvious what they’re doing in the moment, either. If they’re not moving, for example, is that because they’re planning motion? Or because they want to move but something’s in their way? Or maybe they’re just waiting around on purpose? Who knows! But if you want to walk around them, or do other tasks in the same area, you really need to understand what the robot is going to do.
For humans, this happens almost without thinking, because we’re inherently good at communicating with other humans. It’s not just that we talk to each other, but we can predict with quite good accuracy what other people are likely to do. We can’t do this naturally with robots, but we can cheat a little bit, by taking advantage of alternate interfaces to see what a robot will do next. This has been done before with screens, but researchers with CU Boulder's ATLAS Institute are testing whether augmented reality (via semitransparent projections overlaid on a user’s vision with a Microsoft HoloLens) could be both much more intuitive and more effective.
The primary goal of the AR in these contexts is conveying “motion intent” by displaying what the robot is going to do in the near future. The researchers experimented with several different ways of communicating this:
NavPoints: Displays the robot’s planned flight path as a series of x lines and navigation waypoints. The lines connect the robot’s current position to its future destinations in sequential order. Destination waypoints are visualized as spheres and indicate the robot’s precise destinations in 3D space. Each destination sphere also renders a drop shadow on the ground directly below it. Above each navigation point are two radial timers. An inner white timer indicates when the drone will arrive at that location, and the outer dark blue timer indicates when the robot will leave that location. Smaller spheres travel along the lines, moving in the same direction and velocity the robot will travel between destinations, thus providing an anticipatory cue for future robot velocity and direction.
Arrow: While the NavPoints design provides users with a great deal of information, it may be distracting or confusing to the user due to potential overdraw. The Arrow design takes a more minimal approach. The virtual imagery consists of a blue arrow head that travels through 3D space in the exact path the robot will eventually take x seconds in the future. As the arrow moves, a line is left behind that traces the arrow’s path back to the robot. Using this line, users can explicitly see the path the arrow has taken, which the robot will eventually follow.
Gaze: This design is inspired by prior research that has demonstrated the remarkable potential of gaze behaviors for conveying intent. [It] provides virtual imagery that completely alters the robot form by overlaying a x-meter diameter white sphere with a pupil directly over the aerial robot, effectively transforming the robot from a multirotor into a “flying eye.” While moving between destinations, the eye model stares at its current destination until it enters a predetermined distance threshold of y meters between itself and the current destination, at which point the eye turns and focuses on the robot’s next destination.
Each of these augmented reality designs was tested to see how well humans could perform tasks while the robot was moving around near them. The best performing was NavPoints, and study participants subjectively liked it the most as well. Arrow worked okay but was light on detail, and at least one study participant was creeped out by the giant floating eyeball of Gaze: “I had a pretty good idea of what direction the robot was going to go, but the design made me very nervous (giant floating eyeballs bring killer robots to mind).” I’m with you, buddy.
For (lots!) more detail about how augmented reality can be applied to HRI and what’s next for this research, we spoke with Dan Szafir at the ATLAS Institute, CU Boulder.
IEEE Spectrum: The paper says that AR applied to HRI “is critically understudied and represents a fairly nascent research space.” Why do you think that there hasn’t been more focus in this area?
Dan Szafir: We don’t think the lack of past AR-HRI work is due to lack of interest (in fact there have been a few papers), but rather due to past barriers imposed by technology and hardware limitations. Even only a few years ago, a lot of AR and VR work relied on custom hardware made in the lab, meaning you had to have expertise in many different specialties (optics, mapping and localization, ergonomics, design, graphics, etc.) just to get started. The refocus on AR and VR from industry (Microsoft, Google, Facebook, and all the startups) and subsequent release of new headsets and related technologies (trackers, controllers, etc.) is making it much easier for researchers in fields not traditionally linked with AR and VR to get a foot in the door and start exploring how these technologies might be of use.
Advancements in robotics are also seeing a similar growth, where robots are leaving labs and factory floors and making their way to the public domains, businesses, and even homes. We expect to see, and in some ways are already seeing (based on the popularity of our VAM-HRI workshop), an explosion of research and commercial products that combine AR technology and robotics, as there is so much to discover and gain by mixing these two evolving fields.
Why take this augmented reality approach to communicating intent rather than something more traditional, like a map overlay on a tablet, text on a screen, or voice communication? What makes your approach more effective?
Communication through these other means is by no means ineffective (and indeed other prior research, including our own, has explored some of these ideas!); however, each of the traditional methods you mention all have their own drawbacks that have left the door open for improvement through other means, such as AR technology. Map overlays and text information (on tablets or other 2D displays) require attention shifts that have proven to be quite distracting, where users must shift their gaze and attention from the robot, to the screen, back to the robot, back to the screen, etc.
For instance, in our “Improving Collocated Robot Teleoperation with AR” paper, we found such contextual shifts significantly increased the number of times users crashed an aerial robot they piloted, as the users were unable to focus their attention solely on the robot they were interacting with. AR technologies can mitigate such gaze shifts and also allow users to see information embedded within its natural context, while 2D displays require some degree of mental translation. For example, using AR technology, a user can see exactly where a robot will move just by looking at their surrounding environment; there is no need to translate a location from a map overlay to the real-world environment, which might incur an additional cognitive burden, as that work has already been done for them.
Traditional methods using 2D displays also cannot provide stereoscopic depth information, which may allow AR interfaces to provide information with better depth precision and immersion. Voice communication, on the other hand, may not work for noisy environments. Also, voice communication broadcasts information publicly to a surrounding area, whereas ARHMD [AR head-mounted display] interfaces can limit information only to those privy or interested. Of course, our approach is also not without drawbacks—the main limitation of our approach is that it does require an additional piece of hardware (i.e., the ARHMD), which may not be appropriate or feasible for all use cases.
How did you come up with these particular visualizations? Can you describe some of the iterations, particularly designs that you ended up not using, and why they didn’t work?
As you would imagine, designing for our visualizations was a multistep iterative process. Our team started by identifying what information may be useful when it comes to communicating intent and planning future actions around another agent (such as destination heading, destination location(s), and velocity). We also developed a theoretical design framework for AR-HRI with three categories regarding how/what sorts of graphical augmentations might improve human-robot interaction: augmenting the environment, augmenting the robot, and augmenting the user interface.
With this in mind, we then moved from asking “what should these visualizations show?” to “how should these visualizations show it?” We actually prototyped a number of designs that didn’t make it into the final experiment or the paper—two examples that didn’t make the final cut were a “wall” design and a “floor path” design. The wall design erected virtual walls around the robot’s future path of travel indicating constraints regarding where users should not travel. With this design, the user no longer needed to decide where it might be safe to move or not as the system explicitly communicated whether they should move in a certain area or not based on whether the robot would travel there in the near future. The floor path design drew out a 2D path of the robot’s intended movement on the floor, similar to 2D GPS navigation interfaces. We thought that using such metaphors would be simple, yet conceptually intuitive.
Unfortunately, neither design was included in the final experiment, as our pilot tests found both to be ineffective. The virtual walls, which were continually constructed and destroyed as the robot moved through the environment, ended up cluttering the user’s vision and made navigating and understanding any given situation quite confusing. The floor paths provided users with good insight as to the x-y location of the robot, but as we were dealing with an aerial robot, height information was lost. However, we’d like to revisit both design ideas in the future as we still think they may hold promise in different use scenarios.
Most humans are pretty good at being able to predict what other humans are going to do, and it’s often through complex social communication. To what extent do robots need to use these kinds of social channels to be good communicators?
Robots absolutely need to be aware of these kind of social channels and, to the extent possible, it can be useful for them to also leverage such social channels to communicate. However, these social channels come with certain requirements (e.g., using gaze cues may require some sort of recognizable eyes, gesture may require arms or even articulated hands, etc.). Many robots lack a humanoid appearance and thus may not be able to make use of such cues, yet it is still critical that such appearance-constrained robots communicate their intentions to users. I think automobiles are a good example of this where we have a technology that we commonly use to communicate to other people, but in a different manner than traditional social communication.
Communicating intent is focused on what a robot is doing—do you think there anything to be gained by communicating what a robot is “thinking,” if it’s involved in a decision-making process?
Absolutely! And this is actually one of the things we are working on right now. One way to interpret what a robot “thinks” is to know the data that is being input to the sensors, the robot’s policy (e.g., reward function, goal, etc.), and the potential action set. We believe that communicating such data might help user decision making. As an example, if a user knows the robot’s current policy (e.g., packing boxes), they might start another task (e.g., moving packed boxes) since they know the robot is going to take care of that task. Another possible scenario is to communicate such information to help users debug and/or fix the robot by identifying breakdowns in robot sensing or decision-making.
How do you think the results of your research can be useful in the near future? How well do your results generalize to other robots doing other tasks in other environments?
The results of our study on communicating motion intent with AR will be useful for anyone who has to work closely with or near robots. We believe the first domain to be impacted by these results would be that of manufacturing, where we are already seeing robots being integrated onto factory floors. As ground and aerial robots increasingly assist in day-to-day activities alongside human workers in warehouses and factory floors (i.e., no longer caged off and separated from the rest of the factory floor), it becomes critical that human workers know how and when a robot will move next to remain not only efficient, but safe. As ARHMDs decrease in size and weight and improve image fidelity and field-of-view, you will start to see this research, and other research like it, integrated with daily manufacturing operations. In addition, we also envision this type of technology being quickly integrated into space exploration operations and are already in discussions with NASA (which funded this work) about transferring our technology.
In terms of generalization, we believe our results will generalize well to other tasks as well as other robot form factors (not just aerial robots), although this will require further work to prove. While this study specifically examined displaying aerial robot intent, being able to understand your coworker’s intentions improves your own decision-making regardless of task or environment. Although we used a quadcopter for our study, none of our designs rely upon a vertical factor and could be ported to ground robots quite easily. We hope that our work inspires others to continue to explore this space and further explore how generalizable our approach is as well as find new designs that may display intent information even more effectively, especially for specific use cases.
What are you working on next?
We are looking to build on this work in a variety of ways, including scaling up the current systems using multiple aerial robots and exploring heterogeneous teams consisting of users, aerial robots, and ground robots working together. In addition, we are exploring some of the very questions you raised regarding communicating aspects of robot status in addition to intent and generalizability to different use cases (e.g., joint manipulation tasks).