For those of us who make a living solving problems, the current deluge of big data might seem like a wonderland. Data scientists and programmers can now draw on reams of human data—and apply them—in ways that would have been unthinkable only a decade ago.
But amid all the excitement, we’re beginning to see hints that our nice, tidy algorithms and predictive models might be prone to the same shortcomings that the humans who create them are. Take, for example, the revelation that Google disproportionately served ads for high-paying jobs to men rather than women. And there’s the troubling recent discovery that a criminal risk assessment score disproportionately flagged many African Americans as higher risk, sometimes resulting in longer prison sentences.[shortcode ieee-pullquote quote="One of the things that makes big data so attractive is the assumption that it's eliminating human subjectivity and bias. After all, you're basing everything on hard numbers from the real world, right? Wrong. Predictive models and algorithms, says author Cathy O'Neil, are really just "opinions embedded in math."" expand=1]
Mathematician and data scientist Cathy O’Neil has a name for these wide-reaching and discriminatory models: Weapons of Math Destruction. In her new book by the same name, she details the ways that algorithms often perpetuate or even worsen inequality and injustice.
We spoke to O’Neil last week during a Facebook Live session to find out how programmers and data scientists can ensure that their models do more good than harm.
[iframe https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2FIEEE.Spectrum%2Fvideos%2F10154556908936182%2F&show_text=0&width=400 allowfullscreen=true expand=1 height=400 width=400]
Here are a few key takeaways:
1. Recognize the Signs of a “WMD”
A signature of a Weapon of Math Destruction is that it’s used to determine some critical element in the lives of many people. We’re already using algorithms to sort resumes for job openings, automatically schedule shifts for service industry workers, decide the price of insurance or interest rates on a loan, or even to help determine how long a person will spend in jail when convicted of a crime. Because these algorithms affect crucial outcomes for millions of people, they have the potential to do widespread damage.
They’re Secret or Unaccountable
The people most affected by WMD’s often don’t understand the rubric by which they’re being scored, or even that they’re being scored in the first place. The methodology behind them is often a “trade secret,” protecting it from public scrutiny. While many companies argue that this keeps people from learning the rules and gaming the system, the lack of transparency also means there’s no way to check whether the score is actually fair. Machine learning algorithms take this one step further; while they’re powerful tools for finding correlations, they’re also often black boxes, even to the people who create them.
Weapons of Math Destruction have a way of creating their own reality and then using that reality to justify their model, says O’Neil. An algorithm that, say, targets financially vulnerable people for predatory loans creates a feedback loop, making it even harder for them to get out of debt. Similarly, a model that labels a first-time drug offender as higher-risk because he comes from a high-crime neighborhood potentially makes that problem even worse. If his high risk score results in a longer jail sentence, he’ll have fewer connections to his community and fewer job prospects once he’s released. His score becomes a self-fulfilling prophecy, actually putting him at a greater risk of reoffending.
2. Realize There Is No Such Thing as an “Objective Algorithm”
One of the things that makes big data so attractive is the assumption that it’s eliminating human subjectivity and bias. After all, you’re basing everything on hard numbers from the real world, right? Wrong. Predictive models and algorithms, says O’Neil, are really just “opinions embedded in math.” Algorithms are written by human beings with an agenda. The very act of defining what a successful algorithm looks like is a value judgement; and what counts as success for the builders of the algorithm (frequently profit, savings, or efficiency) is not always good for society at large. Because of this, O’Neil says, it’s important for data scientists to look at the bigger picture. Who are the winners in my algorithm—and even more importantly—what happens to the losers?
3. Pay Attention to the Data You’re Using
There’s another reason that algorithms aren’t as trustworthy as we might think: The data they draw on often comes from a world that’s deeply prejudiced and unequal. Crime statistics might seem objective—that is, until you realize that, for example, the mechanisms of the U.S. criminal justice system have been applied unfairly to target minorities throughout its entire history. That bias shows up in crime data. Researchers know that black and white people use marijuana at almost identical rates, but black teenagers are much more likely to be arrested for marijuana possession. The disparity in the numbers has much more to do with systemic racial profiling and a ramped up police presence in historically black neighborhoods than it does with actual levels of criminality.
We’ve made the decision as a society to stamp out discrimination based on race, gender, sexual orientation, or disability status—and fortunately, most data scientists know to be very careful when using these attributes to categorize people or model behavior. But data from the real world is often fraught with less-obvious proxy variables that are essentially stand-ins for those characteristics. Zip codes, for example, are an easy proxy for race, thanks to decades of the discriminatory housing practice called redlining.
4. Get Honest About What You’re Really Modeling
Human behavior is messy, which often means that direct measurements of the attributes we’re trying to model (like criminality, trustworthiness, or fitness for a job) don’t actually exist. Because of this, data scientists often rely on other variables they believe might correlate with what they’re trying to measure.
Car insurance companies, for example, use credit scores as a way to determine how reliable a driver is. At first glance it sounds reasonable to assume that a person who regularly pays her bills on time might be more conscientious or responsible. But strangely, Consumer Reports recently discovered that people with low credit scores and clean driving records were being charged much more for car insurance that people with high credit scores and DUIs on their driving records.
This, of course, is nonsense. Having a previous DUI is a much better indicator of a driver’s likelihood of getting into an accident. But O’Neil asserts that there might be a hidden reason the insurance companies continue to incorporate credit score into their models: it’s a direct measurement of financial vulnerability. Drivers with low credit scores don’t have as much leverage to shop around for lower rates, and a person who’s desperate for insurance is often willing to pay much more to get it.
5. Examine and Systematically Test Your Assumptions
Even well-intentioned algorithms can have flawed assumptions built in. For example, the recidivism risk score mentioned earlier is an attempt to make communities safer by locking up potentially violent repeat offenders and releasing those who are deemed a lower risk. Other intended benefits would be reducing the prison population and making the justice system more fair. But once we lock people away, says O’Neil, we treat prisons as a black box and stop asking questions.
Online giants like Amazon.com take the opposite approach; learning and experimentation are built into their business model. Amazon has a dedicated data laboratory where researchers constantly reexamine every aspect of the consumer experience, finding places along the pipeline where customers get confused or frustrated, or can’t find what they need. This feedback allows Amazon to continuously learn and tweak its online environment to maximize profit.
If we truly wanted to optimize our criminal justice system for community safety, says O’Neil, we’d continuously be running controlled experiments: Does putting someone behind bars with other criminals make them more or less likely to commit a crime upon release? How beneficial are general-equivalency (alternative high school) diploma programs? What is the effect of solitary confinement? Of sexual abuse? How much does it cost to treat someone for a mental disorder, versus repeatedly locking him away?
6. Take The Modelers’ Hippocratic Oath:
Eventually we’ll need laws and industry standards that can keep pace with this technology and require a level of transparency from companies about how they’re using data. It might even require mandatory fairness audits of important algorithms. But in the meantime, a disproportionate amount of the responsibility falls to programmers. Awareness of the issue is a crucial first step. A good way to start is by taking this pledge, originally written by Emanuel Derman and Paul Wilmott in the wake of the 2008 financial crisis:
∼ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
∼ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
∼ I will never sacrifice reality for elegance without explaining why I have done so.
∼ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.
∼ I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.