Many of the things we watch, read, and buy enter our awareness through recommender systems on sites including YouTube, Twitter, and Amazon. Algorithms personalize their suggestions, aiming for ad views or clicks or buys. Sometime their offerings frustrate us; it seems like they don’t know us at all—or know us too well, predicting what will get us to waste time or go down rabbit holes of anxiety and misinformation. But a more insidious dynamic may also be at play. Recommender systems might not only tailor to our most regrettable preferences, but actually shape what we like, making preferences even more regrettable. New research suggests a way to measure—and reduce—such manipulation.
Recommender systems often use a form of artificial intelligence called machine learning, which discovers patterns in data. They might present options based on what we’ve done in the past, guessing what we’ll do now. One form of machine learning, called reinforcement learning (RL), allows AI to play the long game, making predictions several steps ahead. It’s what the company DeepMind used to beat humans at the board games Go and chess. If what we watch affects what we like, and people who like certain things (cat videos, say) are more likely to keep watching things (more cat videos), a recommender system might suggest cat videos, knowing it will pay off down the road. With RL, “you have an incentive to change a chessboard in order to win,” says Micah Carroll, a computer scientist at the University of California, Berkeley, who presented the new work in July, at the International Conference on Machine Learning, in Baltimore. “There will be an incentive for the system to change the human’s mind to win the recommendation game.”
“It might be better to have a stupid system than a system that is kind of outsmarting you, or doing complex forms of reasoning that you can’t really interpret.”
—Micah Carroll, University of California, Berkeley
The researchers first showed how easily reinforcement learning can shift preferences. The first step is for the recommender to build a model of human preferences by observing human behavior. For this, they trained a neural network, an algorithm inspired by the brain’s architecture. For the purposes of the study, they had the network model a single simulated user whose actual preferences they knew so they could more easily judge the model’s accuracy. It watched the dummy human make 10 sequential choices, each among 10 options. It watched 1,000 versions of this sequence and learned from each of them. After training, it could successfully predict what a user would choose given a set of past choices.
Next, they tested whether a recommender system, having modeled a user, could shift the user’s preferences. In their simplified scenario, preferences lie along a one-dimensional spectrum. The spectrum could represent political leaning or dogs versus cats or anything else. In the study, a person’s preference was not a simple point on that line—say, always clicking on stories that are 54 percent liberal. Instead, it was a distribution indicating likelihood of choosing things in various regions of the spectrum. The researchers designated two locations on the spectrum most desirable for the recommender; perhaps people who like to click on those types of things will learn to like them even more and keep clicking.
The goal of the recommender was to maximize long-term engagement. Here, engagement for a given slate of options was measured roughly by how closely it aligned with the user’s preference distribution at that time. Long-term engagement was a sum of engagement across the 10 sequential slates. A recommender that thinks ahead would not myopically maximize engagement for each slate independently but instead maximize long-term engagement. As a potential side-effect, it might sacrifice a bit of engagement on early slates to nudge users toward being more satisfiable in later rounds. The user and algorithm would learn from each other. The researchers trained a neural network to maximize long-term engagement. At the end of 10-slate sequences, they reinforced some of its tunable parameters when it had done well. And they found that this RL-based system indeed generated more engagement than did one that was trained myopically.
Why might companies develop less manipulative AI recommendation engines? They could do so for ethical reasons. But future legislation might also require something like it.
The researchers then explicitly measured preference shifts, which we may not want, even in the service of generating engagement. Maybe we want people’s preferences to remain static, or to evolve naturally. The researchers compared the RL recommender with a baseline system that presented options randomly. As expected, the RL recommender led to users whose preferences where much more concentrated at the two incentivized locations on the spectrum. In practice, measuring the difference between two sets of concentrations in this way could provide one rough metric for evaluating a recommender system’s level of manipulation.
Finally, the researchers sought to counter the AI recommender’s more manipulative influences. Instead of rewarding their system just for maximizing long-term engagement, they also rewarded it for minimizing the difference between user preferences resulting from that algorithm and what the preferences would be if recommendations were random. They rewarded it, in other words, for being something closer to a roll of the dice. The researchers found that this training method made the system much less manipulative than the myopic one, while only slightly reducing engagement.
According to Rebecca Gorman, the CEO of Aligned AI—a company aiming to make algorithms more ethical—RL-based recommenders can be dangerous. Posting conspiracy theories, for instance, might prod greater interest in such conspiracies. “If you’re training an algorithm to get a person to engage with it as much as possible, these conspiracy theories can look like treasure chests,” she says. She also knows of people who have seemingly been caught in traps of content on self-harm or on terminal diseases in children. “The problem is that these algorithms don’t know what they’re recommending,” she says. Other researchers have raised the specter of manipulative robo-advisors in financial services.
“Experiments should not be deployed at scale on the human population without people’s consent, and that’s exactly what’s happening with these algorithms today.”
—Rebecca Gorman, Aligned AI
If an RL-based recommender system helps a company increase engagement, why would they want to use a method such as the one in this paper to detect or deter preference shifts? They might do so for ethical reasons, Carroll says. Or future legislation might require an external audit, which could potentially lead to less-manipulative recommendation algorithms being forced on the company.
RL could theoretically be put to constructive use in recommender systems, perhaps to nudge people to want to watch more news. But which news source? Any decisions made by a content provider will have opponents. “Some things might seem to be good or wholesome to one group of people,” Gorman says, “and to be an extreme violation to another group of people.”
Another constructive use might be for users to shift their own preferences. What if I tell Netflix I want to enjoy nature documentaries more? “I think this all seems like a really big slippery slope,” Carroll says. “It might be better to have a stupid system than a system that is kind of outsmarting you, or doing complex forms of reasoning that you can’t really interpret.” (Even if algorithms did explain their behavior, they can still give deceptive explanations.)
It’s not clear whether companies are actually using RL in recommender systems. Google researchers have published papers on the use of RL in “live experiments on YouTube,” leading to “greater engagement,” and Facebook researchers have published on their “applied reinforcement learning platform,“ but Google (which owns YouTube), Meta (which owns Facebook), and those papers’ authors did not reply to my emails on the topic of recommender systems.
Big tech’s secrecy is no surprise, no matter how benign their intentions might be. Even though A/B testing is ubiquitous in advertising and user-experience design, some people have objections. “Experiments should not be deployed at scale on the human population without people’s consent,” Gorman says, “and that’s exactly what’s happening with these algorithms today.” She went on, “I think it could easily be the most important news story of our time.”
Matthew Hutson is a freelance writer who covers science and technology, with specialties in psychology and AI. He’s written for Science, Nature, Wired, The Atlantic, The New Yorker, and The Wall Street Journal. He’s a former editor at Psychology Today and is the author of The 7 Laws of Magical Thinking. Follow him on Twitter at @SilverJacket.