New Software Knows When an Online Reviewer Is Lying
The first step to eliminating deceptive online reviews is identifying them
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
How do ants tell one another where to find food? They leave little chemical droplets for one another, called pheromones, which the other ants can smell. As more and more ants use the trail, the smell becomes stronger.
How do humans find stuff online? Our version of pheromones are reviews. Our fellow travelers in the shopping world read reviews, go to the places that get good ones, and leave their own reviews, reinforcing the trail, just like the ants.
And that’s especially true for fellow travelers in the literal sense: We rely on reviews, especially when it comes to things like restaurants and hotels. After all, every Econolodge is supposed to be more or less the same, but if you’ve ever stayed in two of them, you know that’s not the case.
But here’s a problem: The reviews aren’t all the same themselves. This morning I looked at booking.com’s 134 reviews of the Silversmith, a hotel in Chicago’s Loop district that I’ve stayed at. Someone named Tamara wrote, “The two ladies at the front desk spent a good deal of time helping us daily.” And someone listed only as Anonymous wrote, “Great design, large rooms.” But another anonymous reviewer wrote, “My room was very small and dark. Too busy in front of the hotel, and overhead trains were very noisy.”
And here’s a worse problem: If you don’t trust the hotel, are you sure you can trust the hotel reviewers? As it turns out, some reviews are deceptive—written not by fellow ants but rather by people trying to trick us into thinking the rooms are large (but really they’re small) and the staff is helpful when really they’re not.
Worse, it turns out that we humans are very bad at detecting false pheromone trails. Fortunately, the problems the digital world giveth, it also taketh away—there’s the hope at least that software can do a lot better when it comes to detecting deceptive reviews.
My guest today has written some of that software. Myle Ott is in fact a bit of a software prodigy. According to his biography at the Jack Kent Cooke Foundation, from whom he received a graduate scholarship in 2006, he skipped five grades to enter California State University at Los Angeles at age 12. He went on to Cornell University for his master’s of engineering and computer science in 2007 and is still there, getting his Ph.D. in computer science.
Last month at the annual meeting of the Association for Computational Linguistics, in Portland, Ore., he presented a paper, “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” which he coauthored with three other Cornell researchers.
Myle, welcome to the podcast.
Myle Ott: Hi. Thanks for having me.
Steven Cherry: First of all, how serious a problem is this? Do we know how many ads are deceptive or how many people are being fooled by them?
Myle Ott: So we don’t actually know for sure when we’re reading a review whether it’s deceptive or not, and part of that means that we don’t know how many reviews out there are actually deceptive.
Steven Cherry: All right. Good. That leads us to the research itself. You had to start out by coming up with some reviews that you knew were deceptive. How did that work?
Myle Ott: So because we couldn’t sit down and read these reviews and figure out whether they were deceptive or not, we chose to create our own deceptive reviews. And we did that using a service through Amazon called Mechanical Turk, which let us pay 400 people to write fake reviews for hotels in the Chicago area.
Steven Cherry: So they were deceptive in what way? They were good reviews of places that you knew were bad or what?
Myle Ott: So we actually picked the 20 most popular hotels in Chicago for this study, and we asked people to write only positive reviews for this first look. We focused for this work strictly on those positive reviews, and negative reviews is actually something we want to look at in the future here.
Steven Cherry: Good. So you then tested humans to see if they could tell the deceptive ads.
Myle Ott: Once we gathered our known fake reviews, we also combined them with some truthful reviews from Trip Advisor, and we passed them along to three volunteer undergraduate human judges to read the reviews and decide for each one whether they thought it was truthful or deceptive. And what we ended up finding out was that the human judges actually performed pretty much at chance at detecting deception. And this is not particularly surprising; in fact, a lot of previous deception detection work has suggested that humans are actually very bad at detecting deception, and even when trained they don’t necessarily do any better. They might become more skeptical, but they won’t necessarily perform any better overall.
Steven Cherry: All right. So then you wrote some AI software and trained it with some of the known deceptive ads?
Myle Ott: That’s right. Once we had these deceptive reviews, we could use them to train machine-learning classifiers, which learn from the fake reviews and the truthful reviews a set of indicators which can help later identify whether a new, unseen review is either truthful or deceptive.
Steven Cherry: And I guess we’ve seen this actually on the podcast before with artificial intelligence software, where, for example, the people who wrote the software that was a contestant on “Jeopardy!” They did the same thing—they sort of trained it with past “Jeopardy!” questions and answers. And that’s a little bit like what you did here.
Myle Ott: Do you mean, like, Watson?
Steven Cherry: Yeah, Watson.
Myle Ott: Yeah, so the Watson computer also learned from previous “Jeopardy!” games in the same way that we were able to train our algorithms to learn from these known deceptive and truthful reviews so that in the future it could accurately predict whether a review is truthful or deceptive.
Steven Cherry: And so your AI software in effect learns to distinguish some key characteristics that told the deceptive reviews from the real ones.
Myle Ott: That’s right. It starts with a sort of blank slate and tries to find, given these sets of reviews, what words and what features are sort of common to the deceptive reviews and likewise what are common to the truthful reviews. And it assigns in fact a weight to each of these features so that we can later analyze and determine what the most important features in the deceptive reviews and what the most important features in the truthful reviews are.
Steven Cherry: Yeah. So this was sort of the most interesting thing about your research, I think. What did it learn about which are the deceptive ads and which are not?
Myle Ott: So in this study we ended up learning a lot of features to indicate whether a review was deceptive or truthful. And one of the interesting things we learned was that deceptive reviews use more verbs, adverbs, and pronouns, whereas truthful reviews generally had more nouns, prepositions, and adjectives. There was one big exception, which was superlative adjectives, words like “best” and “finest,” which are actually more common in deceptive reviews. Fortunately, this actually fits in really nicely with some previous genre identification studies on imaginative versus informative writing. Imaginative writing, meaning any sort of creative or fictional work, typically contains more verbs, adverbs, and pronouns, much like our deceptive reviews did. Similarly, informative writing, meaning mostly nonfiction work, typically contained more nouns, prepositions, and adjectives, much like we found in our truthful reviews.
Steven Cherry: So that’s supported by some earlier research on the difference between fiction and nonfiction in general. Is that right?
Myle Ott: That’s right. So there’s the British National Corpus, which contains roughly 2500 texts that are broken up into either informative or imaginative genres.
Steven Cherry: Myle, you compared your research to other research that distinguished between fact and fiction. But there’s also research specifically on people lying. Did you look at that as well?
Myle Ott: Right. So there’s actually a lot of work in the psychology community about deception; there’s in fact even sort of ongoing debate in the community about whether or not there exists a sort of universal set of deception cues, some sort of surefire way to know whether someone’s lying or not. And there’s all sorts of theories on what this universal set of cues might look like. For example, one important finding called psychological distancing suggests that people try to distance themselves from their lies, and this results in a decreased usage of the first-person singular words like “I,” “me,” and “myself.” Instead, we found an increased usage of first-person singular in our deceptive reviews, which we believe to be either a subconscious or conscious attempt on the part of our deceivers to enhance their own credibility by emphasizing their own presence.
Steven Cherry: So I guess that would be something to look at further.
Myle Ott: Definitely. And it’s something that is an ongoing debate in the psychology community.
Steven Cherry: And you said that there’s some parts of your research which run contrary to previous research.
Myle Ott: Right. So for example, while adjectives in general are considered something found in informative reviews, we found that superlative adjectives were more common in our deceptive reviews.
Steven Cherry: That kind of makes sense, right? People are sort of going over the top when it comes to praising a hotel.
Myle Ott: Exactly.
Steven Cherry: So it turns out your software did really well, right? It identified the deceptive ads 90 percent of the time, and humans actually did no better than random. Is that right?
Myle Ott: That’s right. So our final algorithm was able to pick out the deceptive reviews just over 90 percent of the time, whereas the best human judge was just over 60 percent accurate.
Steven Cherry: In your paper, you say that, strictly speaking, your research results are limited right now to hotel reviews—in fact to reviews of Chicago hotels. But the next step, I guess, would be other cities and restaurant reviews and product reviews and everything else.
Myle Ott: Absolutely, that’s right. In fact, even only positive reviews, because that’s all we’ve looked at so far. And right now we’re thinking of first moving on to negative reviews, because there seems to be a growing interest in the possibility of people writing fake negative reviews to make their competition look bad. And definitely directions for future work include looking at, like you said, these restaurant reviews and hotels in other areas as well.
Steven Cherry: Very good. Myle, you’re five years ahead of everybody else—you’re only 22 and should be done with your Ph.D. in a year or two. And you’ve already presented some pretty cool research at a cool conference in one of the coolest cities I know. So it’s probably completely superfluous for me to wish you good luck, but I will anyway.
Myle Ott: Well, thank you very much.
Steven Cherry: We’ve been speaking with Myle Ott, a Ph.D. student at Cornell University, who coauthored a study that showed computers can do a much better job than humans at detecting deceptive reviews of hotels at sites like hotels.com. For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
You can also follow us on Twitter, @spectrumpodcast.
This interview was recorded 26 July 2011.
Audio engineer: Francesco Ferorelli
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum's audio programming is the audio version.