What People See in 157 Robot Faces

The largest study of robot faces we've ever seen shows the right way to design an expressive robot

8 min read

What People See in 157 Robot Faces
Image: University of Washington

In recent years, an increasing number of robots have relied on screens rather than physical mechanisms to generate expressive faces. Screens are cheap, they’re easy to work with, and they allow for nearly unlimited creativity. Consequently, there’s an enormous variety of robot faces, with a spectrum of similarities and differences both obvious and subtle. However, there hasn’t been a comprehensive study of the entire design space, possibly because of how large it is, and this is bad, because there’s a lot to learn.

At the ACM/IEEE International Conference on Human Robot Interaction (HRI) last month, roboticists from the University of Washington in Seattle presented a paper entitled “Characterizing the Design Space of Rendered Robot Faces.” When they say “characterizing” and “design space,” they aren’t kidding: They looked at 157 different robot faces across 76 dimensions, did a pile of statistical analyses on them, and then conducted a set of surveys to figure out how people experience robot faces differently.

The researchers determined what faces to include in their data set in the obvious way—if it’s a digital face that has a good picture that can be found through an Internet search, it’s in. The 157 resulting robots were coded across dimensions, including the presence of a particular element on the face (e.g., mouth, nose, eyebrows, cheeks/blush); the color of these elements and the face (and any additional features); and the size, shape, and placement of each element. Some properties are binary (does it have a mouth?), and others were discretized, like how close together the eyes are. The researchers also recorded a bunch of other useful stuff, including where the robot is made and what it was designed to do.

Looking at all 157 faces provided some interesting insights—over a third of robot faces are black, for example. Most faces have a mouth but no nose or cheeks or eyebrows. Circular eyes were by far the most popular, while only 10 percent of faces had eyes that were shaped like a human’s. The researchers also noticed a few design clusters that probably won’t surprise you: There’s a group of robots that had Baxter-like faces, as well as a group of robots that had EVE-like faces. Very simple faces (just eyes) are popular as well.

The really interesting stuff came from two surveys the researchers did to figure out how specific dimensions of robot faces impact people’s perception of that robot. In the first survey, the researchers chose 12 representative robot faces, designed to cover a continuum of points along the spectrum of realism and detail:

UW robot faces studyThese are the faces the researchers used in their first survey. The robots are presented (left to right) in order of increasing detail. The number on the scale indicates the number of missing features, like mouth or eyebrows, on the face.Image: University of Washington

Each survey participant (there were 50 in total) rated the robot faces across six 5-point semantic differential scales, and were also asked to rate how much they liked each face. Lastly, the participants chose a generalized job or function that they thought each face would be most appropriate for, and gave each face a brief description. Here are some highlights from the results, as described by the researchers in the paper they presented at HRI:

Friendliness: Yumi, FURo-D, Buddy, and Datou were perceived to be the friendliest robots. The latter three robots had relatively detailed faces, but were not exceedingly realistic, thus avoiding the Uncanny Valley effect. Jibo and Gongzi were perceived to be the most unfriendly.

Intelligence: The robots rated as most intelligent were FURo-D and Gongzi, while Sawyer, Buddy, and Datou were rated as least intelligent. Although these latter three robots were considered the least intelligent of the set, their ratings hovered around the “3 (Neutral)” mark; they were not overtly rated as “Unintelligent.”

Trustworthiness: Datou and FURo-D were deemed the most trustworthy, and Gongzi the least. Gongzi was frequently named “angry robot” or something to that effect (17/50), with respondents saying that it “seems almost mean,” “looks menacing,” and “this robot is intimidating, seems like it would be used by law enforcement.” Since the robot’s eyes are large, they lend themselves to a pupil-less appearance; the respondents may have responded differently if pupils were present.

Overall preference: The robots with the highest likability were Yumi and FURo-D, while Jibo was the most disliked, alongside Gongzi, EMC, and Valerie. The four robots rated highest in likability were also the ones perceived to be the friendliest.

The next step was to use a controlled set of faces (as opposed to the faces on real robots) to try to narrow down the specific impact that different features have on people’s perceptions. The image below shows this set; the face on the top left is the “average” face, and the rest of the faces differ from it by one feature.

UW robot face studyThe researchers used this set of faces for their second survey. The face on the top left is what the researchers defined as the baseline, or “average face,” in their data set. All the other faces differ from the baseline by one feature, like blue eyes, cheeks, etc.Image: University of Washington

Some of the results:

Friendliness: No faces were deemed significantly more friendly than the baseline face. The significantly less friendly faces were the ones lacking a mouth, lacking pupils, and possessing eyelids. The face with no mouth, rated as being most unfriendly, was frequently referred to as “creepy” by the participants, and that it gave an air of surveillance, e.g., “[it] looks like it is watching my every move." 

Intelligence: The face deemed most intelligent featured eyebrows. The design of the eyebrows was such that they were lowered closer to the top of the eye, thus avoiding the intimation of a baby face–an effect that could induce the perception of increased naivete and therefore of lower intellect. This face was also rated as being the most mature. The faces that were significantly less intelligent than the model face were the ones with no mouth, closely spaced eyes, and cheeks.

Trustworthiness: No faces were ranked significantly more trustworthy than the baseline. If respondents interpreted trustworthiness to be equivalent to honesty, then it is possible that the symmetrical structure of the face and the large, even eyes could have played a role in their ranking, as those features have been illustrated to promote a perception of the face being honest. The least significantly trustworthy faces were the same as the three rated as being least friendly: the face with eyelids, the face without a mouth, and the face without pupils.

Overall preference: No faces were rated as significantly more likable than the baseline face, although the robot with irises was the most liked overall, with one respondent noting that “making the eyes a little more human with the color placement makes it feel quite friendly and approachable.” Robots with no mouth, no pupils, cheeks, small eyes, white face, and eyelids were significantly less likable than the baseline face, with the no mouth, no pupil, and eyelids faces receiving the lowest ratings of the set.

One way in which this information is potentially useful is how different faces correlate with what study participants guessed that their jobs should be. As one example, if you want people to view your robot as effective at security, consider giving it eyelids but no mouth. Service robots would benefit from eyebrows, and entertainment robots could use significantly more detail than industrial robots.

For more on how these results can help with robot design, we spoke with first author Alisa Kalegina from the University of Washington.

IEEE Spectrum: Why is it so important to understand how people react to different robot faces?
Alisa Kalegina: Ultimately, for us, it comes down to thoughtful and conscious design choices. If we study what our impressions of faces are, we can create technology that fits more harmoniously into our lives. As we mentioned in the paper, reading faces is an integral part of being human and has a significant effect within HRI, sometimes even altering how we end up perceiving other people! We felt it was important to identify what some of these perceptual trends could possibly be.

As you say in your paper, many of these robots have a variety of dynamic motions that they use to interact with humans, both on their screens and physically. How do you think your results might change if the participants were shown videos instead of still images?
My guess is that robots that were seen as being “creepy” due to the Uncanny Valley would seem even more creepy when people watch videos of them, due to the added cognitive dissonance we would experience of seeing a highly human-looking face being paired with explicitly mechanical movement that is much less robust—for the most part—than that of human beings. I also think that many robots would be rated as more likeable when compared to their static images if they had the ability to blink: Static images of staring eyes are disconcerting at a biological level, and blinking helps soften that impact.

How do you think your results might have been different if you surveyed groups of people from different cultures?

This is a fascinating question. We know from previous work that perception of faces definitely is influenced by culture and demographics, but this space hasn’t been explored when it comes to rendered faces specifically. Previous work with physical faces shows significant variability when it comes to how likable and trustworthy a robot is, based on culture. For instance, some cultures have a preference for their robots to look explicitly mechanical, like an industrial robot, and so would give higher likability ratings to robots of that type. We have another example from our survey: We looked at the divisive feature of “blush”/cheeks—some people really enjoyed the feature, and others were quite vexed with it. We could hypothesize that cultures that have a lot of exposure to “cute” cartoons (like Japan) would be much more amenable to this feature than a different culture.
How would you suggest that people designing new interactive robots make use of your results?

I think our work helps to elucidate what kind of effect certain design choices have, and while these results are by no means definitive, I think they do help prime our thinking about design consequences. One aspect we found important is to look at robot jobs/context, so designers could get a feel for what kind of features people deem appropriate for different kinds of jobs. Including those features would help to fit the robot more seamlessly into its role. Designers could also use our exploration of the design space to inform their choices within the grander context of what other people are doing and what the predominant trends have been.
What are you working on next?

Right now we’re working on examining how age affects perception of rendered faces—we’ve already seen some interesting differences between what kids (aged 7-11) think of the rendered faces in our work as compared to adults, through our collaboration with Kids Team UW. Next we will be looking into teenage impressions of faces. We’re also in the early stages of our cross-cultural research, looking to use the Lab in the Wild platform—also developed here at University of Washington—for our rendered faces surveys, in order to reliably reach those different cultural demographics.

Pillo robotPillo, a pill-dispensing robot.Image: Pillo Health

What is your own personal favorite robot face?

One my favorite rendered faces is definitely Pillo—it has the “unconventional” design choice of very widely spaced eyes that give it a really goofy look!

“Characterizing the Design Space of Rendered Robot Faces,” by Alisa Kalegina, Grace Schroeder, Aidan Allchin, Keara Berlin, and Maya Cakmak from the University of Washington, Macalester College, and Lakeside School in Seattle, was presented at HRI 2018 in Chicago.

The Conversation (0)