Can Software Predict Repeat Offenders?
Criminologist Richard Berk’s algorithms are getting better and better at it
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
Computer interface pioneer Alan Kay once said, “The best way to predict the future is to invent it.” Sometimes, though, we want to predict the future in order to prevent it.
In the sci-fi movie Minority Report, society in the year 2054 has come up with a way to predict murders. The police then go out and prevent the crime—for maximum dramatic effect, right before it happens.
I don’t know what Alan Kay thinks of Minority Report, but the movie is something of a cult classic among computer interface experts because of the way computer databases are manipulated on gorgeous transparent screens by seemingly swiping and moving text in the thin air—in a way that we heard described, still very experimentally, in a recent podcast.
But Minority Report fell down in one area: When it came to predicting the future, it relied on something not very different from old-fashioned psychics, though it dressed them up with some real-time computer tomography.
What we really want are algorithms that predict crime. And researchers are on the case. Crime forecasting got its start in the 1990s, when New York City used a statistical approach to identify high-crime subway stations and then neighborhoods.
Nowadays, sophisticated software is being used in a number of cities, and the most sophisticated is being used in Baltimore, Philadelphia, and since 2010, in Washington D.C. [NOTE: The software was planned for but never implemented in Washington D.C.—Ed.]
That software was written by Richard Berk, a professor of criminology and statistics at the University of Pennsylvania. In 2009, he was the lead author of an article [PDF] published in the Journal of the Royal Statistical Society with the arresting title of “Forecasting Murder Within a Population of Probationers and Parolees: A High-Stakes Application of Statistical Learning.”
He’s my guest today, from Philadelphia. Richard, welcome to the podcast.
Richard Berk: Thank you very much.
Steven Cherry: Richard, like the precogs of Minority Report, you and your colleagues focused on crimes of murderous intent, but you chose a more modest goal: looking at recidivism rates at 60 000 cases from Philadelphia’s adult probation and parole department. Was that the first research you did on predicting crime, or does it go back before then?
Richard Berk: Actually, it began quite a ways back. When I was out at UCLA, there was some interest from the department of corrections in helping them decide which inmates get assigned to which levels of security. High security levels certainly restrict risk, but they also are very expensive. So you want to use those high-cost beds on inmates who really need them. So the question is, if someone comes in the front door, can you forecast who’s going to create problems for themselves or for others while they’re in prison? And we developed some procedures using machine learning there which turned out to be quite effective and which were eventually adopted by the department of corrections.
Steven Cherry: Tell us about the Philadelphia research. You found some interesting things, such as that age is a better predictor of future murder than having done a previous murder.
Richard Berk: Well, the Philadelphia work began with a focus on homicide, to be sure. We broadened it since, as you were suggesting. The basic findings, really, have to do with how accurate—I think remarkably accurate—the forecasting procedures are, because these are rare events. Perhaps 1 to 2 percent of the offender population under supervision becomes engaged in a crime that might be a homicide or an attempted homicide over a two-year period. So you’re finding needles in a haystack, basically, and we were able to identify individuals correctly who commit a homicide, oh, about 6 times out of 10, 7 times out of 10, when the base rate is 2 out of 100.
Steven Cherry: So tell us how the software works.
Richard Berk: Well, it starts out just as you would expect, and you were already mentioning some things. We use the predictors that are necessarily available in real time, when decisions have to be made. I mean, you can’t do a special study and invent your own predictors if those predictors aren’t going to be available subsequently when decisions have to be made. So we work from existing databases, and it has the standard biographical information. It has things like age and prior record and has things like the age at which an individual committed their first crime. It knows what the instant crime—that is, the crime for which they were just convicted—was, broken down by various categories. We know where they live and all kinds of other things, just like you would expect. And it’s not so much that those predictors themselves tell the story, although it starts there, but the algorithms, in effect, construct new predictors from the old predictors. So you might start out with 20 or 30; by the end of the day, you might have 300.
Steven Cherry: So what are some of the best ones?
Richard Berk: Well, one of the best ones right now is I think the one you mentioned, which is that if you commit an armed robbery, let’s say, at age 12 or 13, that’s bad news—makes a pretty good case that you’re going to be trouble when you’re 18. If you commit that exact same armed robbery at age 30, it doesn’t predict very much of what you’re going to do, let’s say, at age 35 or 40. So it has to do with when you commit crimes as well as what the crimes happen to be.
Steven Cherry: How police departments use your software is up to them, not you, but how software is written can make a big difference to how it’s used. And you know, I can imagine critics saying that if police start tracking the people most likely to commit a future crime, that does some violence to the constitutional presumption of innocence, which, you know—not to be overly dramatic—is one of our important safeguards against tyranny.
Richard Berk: Sure. Well, two things. First is that we use this software, at least so far, with offender populations who have already been convicted, so these are individuals, either in prison, in which we’re trying to decide whether or not to release them, or these are individuals who have been released under supervision, and we’re trying to decide how best to supervise them. This is not used on the general public. These are individuals who have already been convicted of serious crimes. And there, it turns out—and I’m certainly not a lawyer—it turns out that this approach has not been challenged on the grounds that you’re describing, although intuitively you’d think the same issues arise as well, you’re basically treating people based on crimes they haven’t committed yet.
Steven Cherry: Right, and we do have this idea of, you know, sort of giving people a fresh start in life. They’ve paid the price for their crime, and now they’re out and among us. Now, it’s true they’re on probation or parole in the cases that you’re talking about.
Richard Berk: Well, and also it turns out that the legal requirements that are imposed on judges or a probation officer or anybody else, they’re partly what you say, but they’re also partly to protect public safety. So for example, for sex offenders, we’re allowed to do all kinds of things restricting where they can live and what they can do, and that’s perfectly legal even though we’re restricting them with regard to crimes they haven’t committed yet. So when public safety is up for grabs, there are a lot more options with respect to what you can forecast and how you can respond to those forecasts.
Steven Cherry: You know, I said that it—how a police department uses your software is up to them and not you, but I can picture, you know, a police department thinking well, why don’t we just sort of, look at everybody in this city and see who is more likely than who else to predict a crime. I mean, to commit a crime.
Richard Berk: Well, again, you’ve got to talk to a lawyer on that. I mean, I’d be suspicious, but the other problem is simply practical. That is, to develop these forecasting algorithms, you know, they’re learning algorithms, you need a large data set in which you have the predictors included, plus whatever the outcome is that you’re interested in. This is supervised learning. So you’d need, in effect, a population of individuals who you know have or have not committed a crime. I don’t know how you would collect such data on the general population. There’s two things I’d add. The first is that these statistical procedures are not the operation of statistics alone; you need input from stakeholders and in particular—and this is real important—you have to determine for stakeholders the cost of false positives versus false negatives, because that determines how much evidence you’re prepared to accept that someone is a bad guy. So if you want to bend over backwards to prevent bad things from happening, any hint that someone is a bad guy will be taken as evidence, but the price you pay is that you’re going to be mislabeling a lot of good guys as bad guys. Let’s say a probation department wants to allocate its resources effectively. It has to ask, supposing I label somebody as a prospective murderer, it’s costly for me to treat him as such. And every person I label as a prospective murderer who isn’t is money flushed down the toilet, so I don’t want to do that either. So I have to balance the different costs. If I falsely identify someone as a murderer, or correctly, what are the differences in the consequences? And so it’s really important to work that out, and it’s—I don’t want to leave the impression that this is a purely statistical enterprise. It’s really vital to get stakeholders involved and get their input, because that shapes the forecast. The decisions that they have to make and its consequences get actually built into the algorithm.
Steven Cherry: Fair enough. Actually, you used the phrase supervised learning, and I guess that’s a sort of form of artificial intelligence. Maybe you could just talk a little more about that.
Richard Berk: Sure, it goes under various names—data mining, machine learning, supervised learning, statistical learning—that basically come from the different disciplines in which a particular tool has been developed. But they all basically boil down to having a computer search through a very large database, looking for patterns. So it’s kind of like pattern recognition. And what happens is that the computer gets into the data many different ways and over and over again, so it makes passes through the data and with each pass learns more about where the structure is and therefore is able to forecast more accurately, or at least classify more accurately. And in that sense it’s a metaphor for learning: The computer, each time through the data, is in some sense smarter or better informed than it was before. Then of course, depending upon the particular discipline or the particular empirical problem, the details can vary enormously. Historically, when people have done this—economists, sociologists, criminologists—they take very seriously where you started, which is, what are the predictors. And that really isn’t nearly as important as having a good algorithm to search the space defined by the predictors. So you have this high-dimensional predictor space. In that space, in various nooks and crannies, are structure you’re trying to find, you don’t care, like longitude and latitude—those predictors are just coordinates. And you want to use those coordinates in a smart way to find the structure, and the real bottom line is how good is your search algorithm, not what your coordinates mean. So I don’t care if I search on sunspots or shoe size or prior record. What I care about is I search efficiently through the space.
Steven Cherry: You know, sometimes when we bring knowledge to bear on our problem, there’s an information arms race. I’m wondering—I wonder if we’re going to see, you know, the Moriartys of the world take up informatics and, you know, I guess I’m picturing maybe recommendation software for criminals. You know, if you like hitting up grocery stores, maybe you’d like to rob the liquor store on Chestnut and 12th Street or something.
Richard Berk: That’s a real interesting idea, and maybe that’ll be the Minority Report 2, or something. It’s a great idea. I’m sure that stuff like that is already under way with hackers. But I don’t know anything more about it. It seems like a logical extension. These tools are readily available, they run even on modest-sized laptops, they don’t require—many of them are just point-and-click at this point. So yeah, something like that could happen.
Steven Cherry: And you know, on a more serious note, your statistical approaches has been—have been used for other things, right? You’ve looked at homelessness in Los Angeles, for example.
Richard Berk: Yeah, the general idea is—as I said before—is broadly applicable. I’ve used it in prisons. Also looking now, starting to look at OSHA violations to determine which business establishments are at high risk to putting their employees in harm’s way. So there are a variety of things. We work with police departments to help them forecast which domestic violence incidents are likely to be repeat incidents in the future and therefore require police intervention. It’s really quite general.
Steven Cherry: That’s terrific. You know, Richard, despite the obvious concerns and, you know, without speaking for everyone at Spectrum, I guess I think that data is good, and more data is better. So thanks for your work, and thanks for joining us today.
Richard Berk: Oh, it’s my pleasure.
We’ve been speaking with University of Pennsylvania professor of criminology and statistics Richard Berk about his software that predicts which violent criminals are likeliest to commit a violent crime again.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
Announcer: “Techwise Conversations” is sponsored by National Instruments.
This interview was recorded 13 February 2012.
Segment producer: Barbara Finkelstein; audio engineer: Francesco Ferorelli
Follow us on Twitter @techwisepodcast
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.