Court Software No Better Than Mechanical Turks at Predicting Repeat Crime

Humans who looked at just two variables performed just as well as commercial software that examined 137

4 min read
Illustration of multiple figures behind prison bars.
Illustration: Roy Scott/Getty Images

Software now widely used by courts to predict which criminals are likely to commit future crimes might be no more accurate than regular people with presumably little to no criminal justice expertise, a new study finds.

Predictive algorithms now regularly make recommendations regarding music, ads, health care, stock trades, auto insurance, and bank loans, among other things. In the criminal justice system, such algorithms have been used to predict where crimes will likely occur, who is likely to commit violent crimes, who is likely to fail to appear at their court hearings, and who is likely to repeat criminal behavior in the future.

One criminal risk analysis tool, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), has been used to assess more than 1 million offenders since it was developed in 1998, and to predict recidivism, or repeat criminal behavior, since 2000. Supporters of such systems argue that automated techniques are more accurate and less biased than humans. However, previous research suggested COMPAS's predictions might be racially biased to underpredict recidivism among white defendants and overpredict recidivism among black defendants.

To investigate further whether algorithms can be more fair and accurate than humans at predicting recidivism, computer scientists recruited 400 workers through Amazon's online Mechanical Turk crowdsourcing marketplace, presumably none of them criminal justice experts. Each worker saw descriptions of 50 people from a pool of 1,000 defendants from Broward County, Florida, who awaited trial in 2013 and 2014. These descriptions contained seven features about each defendant, including their sex, age, and previous criminal history, but not their race.

The crowdsourced workers were then asked to rate the risk that defendants would commit a misdemeanor or felony within two years of their last arrest. These results were then compared to ones from COMPAS.

Graphics comparing the crowdsourced data to the COMPAS results.Image: Carla Schaffer/AAAS

Although the crowdsourced workers analyzed considerably fewer variables than COMPAS, their average results were accurate in 67 percent of the cases presented, about the same as COMPAS's accuracy of 65.2 percent.

"Considering that COMPAS uses 137 variables in its predictions, and that it is a commercial software presumably built on much more data than we had access to, this result was surprising," says study senior author Hany Farid, a computer scientist at Dartmouth College in Hanover, New Hampshire.

Further analysis found that a strategy that only looked at two variables—a defendant's age and total number of prior convictions—was about as accurate as COMPAS. A spokesperson for Equivant, the Ohio-based firm behind COMPAS, said the company was not giving interviews. Equivant posted a statement about the new research shortly before its release, calling it “highly misleading.”

"We believe that the most important implication of our work is that the courts should consider how much credibility to give these types of prediction algorithms—you can imagine that a judge would weigh a risk assessment made from a big-data machine-learning algorithm differently than a risk assessment made from people responding to an online survey," Farid says. "We also believe that there should be more transparency in the use of algorithms in making such critical, life-altering decisions."

"We are not saying in any way that big data, machine learning, artificial intelligence should be abandoned," Farid says. "We are simply saying that their use should be deployed in a careful, thoughtful, and transparent manner, particularly when the results of such algorithms can have life-altering implications."

However, the researchers found that results from both the crowdsourced workers and COMPAS were similarly unfair to black defendants. Farid did note there appear to be differences in the base rates of recidivism across race, with black defendants reoffending at a rate of 51 percent as compared with 39 percent for white defendants, but "these base rates may themselves be the result of racial biases in the criminal justice system—for example, black people are almost four times as likely as white people to be arrested for drug offenses. So what we may be seeing is a ripple effect in policing and prosecution that disproportionately impacts African-Americans."

"On a national scale, black people are more likely to have prior crimes on their record than white people are—black people in America are incarcerated in state prisons at a rate that is 5.1 times that of white Americans, for example," says study lead author Julia Dressel at Dartmouth College. "Within the data set used in our study, white defendants had an average of 2.59 prior crimes, whereas black defendants had an average of 4.95 prior crimes. The racial bias that appears in both the algorithmic and human predictions is a result of this discrepancy."

In the future, there may be ways to test the effectiveness of this kind of software before it goes on the market. "We can imagine that an organization like the National Institute of Standards and Technology (NIST) could undertake the task of creating standards and benchmarks that any software would have to meet," Farid says. "Such a system would require access to the type of data that we used in our study, but at a larger and more diverse scale."

"We think that studies similar to ours should be performed for all such algorithms," Farid says. "We would also welcome access to larger and more diverse data sets to help us understand the efficacy of these algorithms and, possibly, develop more accurate algorithms."

Dressel and Farid detailed their findings on 17 January 2018 in the journal Science Advances

The Conversation (0)