An algorithm that a major medical center used to identify patients for extra care has been shown to be racially biased.
The algorithm screened patients for enrollment in an intensive care management program, which gave them access to a dedicated hotline for a nurse practitioner, help refilling prescriptions, and so forth. The screening was meant to identify those patients who would most benefit from the program. But the white patients flagged for enrollment had fewer chronic health conditions than the black patients who were flagged.
In other words, black patients had to reach a higher threshold of illness before they were considered for enrollment. Care was not actually going to those people who needed it most.
Alarmingly, the algorithm was performing its task correctly. The problem was with how the task was defined.
The findings, described in a paper that was just published in Science, point to a system-wide problem, says coauthor Ziad Obermeyer, a physician and researcher at the UC Berkeley School of Public Health. Similar screening tools are used throughout the country; according to industry estimates, these types of algorithms are making health decisions for 200 million people per year.
Obermeyer and his coauthors won’t reveal the name of the medical center they studied, referring to it in the paper as a “large academic hospital,” nor will they name the company that makes the flawed software. “Since this is such an industry-wide problem that affects so many manufacturers and users, we think it would be counterproductive to single them out,” he says. “It would almost let everyone else off the hook.”
(However, it may be notable that Obermeyer previously worked as an emergency physician at Brigham and Women's Hospital in Boston. And according to the paper, the software is a commercial product that was evaluated in a report by the Society of Actuaries about the 10 most widely used risk-management tools.)
The data scientists who created the algorithm in question weren’t trying to be racist, and neither was the medical center that deployed it. The case serves as a glaring example of how bias can be “baked in” to the system.
The problem arose from the design of the algorithm, and specifically, what it was designed to predict. In trying to determine who would most benefit from the care management program, it predicted each patient’s medical costs over the coming year. It based its predictions on historical data.
Obermeyer explains that the algorithm was accurate in its predictions of costs. “The algorithm was doing absolutely what it was supposed to be doing,” he says, “and it was doing that just as well for blacks and whites.”
However, people with high health care costs are not necessarily the sickest people. “The relationship between health and cost is very different for black and white patients, for reasons that are related to structural biases in our society,” Obermeyer says.
Socioeconomic factors disproportionately affect black patients, he says—and poverty makes it harder to access care. “ Even when you’re insured, there are so many ways in which poverty makes everything hard,” Obermeyer says. “You can have a doctor appointment, but you still need transportation to get to the doctor appointment. And maybe you have to take the day off the work, but you can’t do that if you don’t have a flexible job or a boss who cares about your problems.”
What’s more, multiple studies have shown that health care professionals can have unconscious biases that affect diagnoses and treatment decisions.
There has been great progress from a time when there was overt discrimination against black people seeking medical care. But implicit biases, which are harder to spot, show up in these patients’ contacts with clinicians and other medical staff. One recent study found that unconscious bias in emergency room physicians resulted in fewer black patients receiving proper treatment for heart attacks.
Obermeyer notes that algorithmic bias can creep in despite an institution’s best intentions. This particular case demonstrates how institutions’ attempts to be “race-blind” can fall short. The algorithm deliberately did not include race as a variable when it made its predictions. “But if the outcome has built into it structural inequalities, the algorithm will still be biased,” Obermeyer says. If the algorithm had predicted health outcomes rather than health costs, it would have identified the patients who really were the sickest and most in need of care.
Obermeyer says the medical center in question stopped using the algorithm a few years ago for other reasons. But it’s still in use elsewhere. So his team contacted the company that makes the software tool to inform them of their findings. “We essentially cold emailed them,” Obermeyer says. He says the company was surprised and concerned, and first worked to replicate his findings. “Then they started working with us to fix the problem,” he says. A future software update will likely include changes to algorithm, he says.
Marshall Chin, a physician and researcher at the University of Chicago who has written about issues of fairness in machine learning systems, says this study’s findings are important. “It shows how easy it is for bias to creep into systems that affect literally millions of people,” says Chin, who was not involved in the study.
Chin says that most physicians and health care administrators aren’t aware that the algorithms used to make care decisions may have hidden biases. He says that now is the time to build safeguards into the system. “It’s critical to develop specific steps in the algorithm development process to check if there’s bias—and then to figure out whether something can be done to fix it,” he says.
Chin relates a cautionary tale from the University of Chicago’s hospital. Data scientists there wanted to increase efficiency and speed up the process of discharging patients. They created an algorithm that predicted who would be eligible for discharge within 48 hours so they could route case managers and social workers to those top-priority patients. But the algorithm included zip code data in its variables, which is often a proxy for socioeconomic status.
John Fahrenbach, the University of Chicago data scientist who was leading that project, explains that the zip code data introduced bias. The algorithm learned that people from less affluent neighborhoods often stay in the hospital longer, because doctors are trying to line up additional resources for them.
“So if you have two patients who are medically the same, and the only difference is how well they’re resourced, the algorithm will say: The person who has more resources will go home first,” Fahrenbach says. “Because historically, that’s accurate.”
Fahrenbach caught the problem before the algorithm was deployed. But the experience shook him, and he reached out to the university’s diversity, equity, and inclusion team to talk about developing audit systems. “We are now creating AI governance,” he says. The university is forming an “analytic intervention unit” to scrutinize all the predictive models used in its health system, including both those developed internally and those licensed from outside vendors.
Algorithmic audits may be a big business in coming years. That’s why Cathy O’Neil, who sounded the alarm about algorithmic bias with her book Weapons of Math Destruction, has started an algorithmic auditing company called ORCAA.
O’Neil says the biased algorithm that Obermeyer’s group studied “absolutely would have been picked up by an audit,” she says. “The first question I typically ask is: Whom does this fail?”
She notes that institutions are often wary of asking that question, because scrutinizing their own data takes away their “plausible deniability.” And legal liability is a growing concern. O’Neil sometimes gets calls from people in the data analytics groups of major companies, she says. “The first call is really great. The second call has the corporate lawyer on the phone, who says, ‘I’m not sure we really want to know.’”
O’Neil advocates for the creation of standard procedures for testing algorithms before deployment. She wants regulators to look at existing anti-discrimination laws and determine how these rules apply to algorithmic systems that are used in decision making. “The policy goal is to get the regulators to enforce their laws in the age of algorithms,” she says. “You can’t give up on enforcing anti-discrimination laws because you don’t understand the technology.”
In the case described in Obermeyer’s paper, it’s not clear whether the medical center or the software vender might be liable. But federal law does prohibit discrimination based on race, at least for patients who qualify for federally funded programs such as Medicare and Medicaid.
When such blatant cases of algorithmic bias arise, Obermeyer notes that many experts argue for a moratorium. “They say: ‘Look, if algorithms are using historical data, and the data reflects socioeconomic injustice, we can’t be in the business of making algorithms—because first we have to fix the data, and that means we have to fix society.’”
But Obermeyer has a more optimistic view. He argues that the makers of algorithms need to be cognizant of how the choices they make could reinforce structural biases, and aware of their responsibility.
He also says that data scientists are beginning to understand the consequences of their choices. Obermeyer remembers a conversation with an engineer from the company that made the biased algorithm. They were on a conference call about the algorithm’s problems, and they were chatting before other participants joined. “He said, ‘I took this job in health analytics instead of going into the tech industry because I wanted to make a difference—this is not what I signed up for, so I’m glad I can fix this problem.’”
An abridged version of this post appears in the January 2020 print issue as “Health Care Algorithms Show Racial Bias.”