Can Big Data Win Your Next Court Case?
A lawyer’s startup is mining the court system for its data
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
Big data is a bit like water. It seeps into an industry at a point of weakness, like water in a crack. As the seasons change, the water freezes and expands. Eventually, it can break open solid rock. That’s what happened in baseball, as documented in the book and movie Moneyball. Statisticians started to look at the numbers that really matter, like on-base percentage, and teams started to reevaluate players accordingly. The stats don’t guarantee victory in every game, but over time, you’ll win more of them.
The latest industry to be cracked open by big data may be lawyering.
As it turns out, there’s a fair amount of data out there and—like baseball—when you look at it with a statistician’s eye, you get all sorts of clues that will help you win games. I mean, cases. Won/lost stats for litigators are just the starting point. What if you knew which judges were most likely to grant which specific motions? Which opposing law firms offer big settlements, but only in certain types of cases? Or what impact the flu season—or the hunting season—has on juries?
My guest today is Andrew Winship. He’s the founder of Juristat, a St. Louis–based start-up that’s doing the Moneyball thing in court jurisdictions all over the country. He has a law degree from Washington University, in St. Louis, and was a practicing attorney before founding Juristat. He joins us by phone.
Drew, welcome to the podcast.
Andrew Winship: Thanks, Steven. Thanks for having me on.
Steven Cherry: Ironically, I wrote my intro before looking at your LinkedIn page, which features a picture of you in a baseball cap. Is the comparison to Moneyball apt?
Andrew Winship: Yeah, anyone from St. Louis—it’s a baseball town. We love the Cardinals. I grew up with a number of Cardinals paraphernalia around my crib. In fact, I had a chew toy that said, “Welcome to the world,” signed by Lou Brock, that I probably chewed to death and ruined the value of, to my dad’s chagrin. But the Moneyball analogy is completely apt. If you can figure out the wins above replacement of a pitcher, or if Nate Silver can predict an election, then we can predict a lawsuit.
Steven Cherry: I gave some examples of data mining court records, but they might not be the best ones. What are lawyers already finding useful, or likely to?
Andrew Winship: Every lawyer will understand in their specific area the one metric they would love to know. So they say something like, “We would love a settlement recommendation engine that gives you kind of a second opinion about settlement numbers,” or the best day of the week to argue in front of a particular judge because it’s a venue that you don’t have a lot of experience with.
But then we go deeper and say, “Sure, you can do that. But what if you’re interviewing a number of attorneys? We can show you their win rates, their experience in front of specific judges, their experience with specific case types.” And then the lightbulb clicks on in their head, or we talk about a marketing opportunity. We could show you the most-sued insurance companies in a specific venue, then show the attorneys that represent them and those attorneys’ win rates. And then you can go into a marketing interview or a marketing meeting and show hard data and say, “I’ve got a better batting average than the guy you’re currently using. Why not use me?”
Steven Cherry: Now, the data is more readily available in some jurisdictions than others. I guess there’s two issues: We just assume court records are in the public domain, but that’s not entirely true. And also, there’s not a lot of structure to them.
Andrew Winship: The court records are technically supposed to be in the public domain, but we found out recently with Aaron Swartz and with others that the court seems to be the last ones to be letting go of control over data. Recently, on May 9, the president signed an executive order making open and machine-readable the new default for government information, so the executive is going toward an open and machine-readable data structure.
The Congress has been there for a long time. Thomas.gov is openly scrapeable. They’re not going to have a problem unless you’re trying to DDoS something. But the judiciary both at the state and federal levels seems really reticent to let go of their control over data, and they’re not really doing a very good job of hosting it, cleaning it, or making it available.
Steven Cherry: You had some problems pulling out the data in your home state of Missouri, but some good success elsewhere.
Andrew Winship: I guess like the founders intended, some states behave differently than others, so it’s a bit of an experiment—a 50-state experiment. And so states like New York, Indiana, Washington state are extremely sophisticated and on the cutting-edge, and they really have the data structures put together and have actually produced a pretty nice market, where you can come in and purchase that data in bulk, and they’re making a decent amount of money on it.
Other states, like Tennessee, don’t even have a full-fledged electronic search database. And then there are states like Missouri in the middle, who, they have a database where you can search kind of by the drink—you know, do one case search or one name search—but if you start searching in bulk or start scraping, they’re going to shut you down.
We read that as in violation of the different Sunshine Laws. Each state has a different name to it, so it’s not necessarily the Sunshine Law, but each state has kind of an open records policy, and my reading of it is that most courts that deny bulk data access are in violation of that policy.
Steven Cherry: So how deeply are you mining in these databases? And maybe you could describe the interface that you present to lawyers.
Andrew Winship: Every time a lawsuit is filed, something called a “docket” is created, and every single event that happens during that case creates a docket entry. So motions argued in front of a judge, a docket entry is created. The jury trial begins, a docket is created. The jury trial ends, another docket entry is created.
What we’re doing is collecting some general case data, such as the judge, the attorney, dates. And then we collect all those docket entries, which were all kept by clerks in natural language. We parse and clean those, and that’s where kind of the lion’s share of the work that we’re doing happens. What attorneys then see is not this dirty natural language that changes from clerk to clerk, venue to venue, and jurisdiction to jurisdiction, but they see graphs, charts, and data-visualization products that will basically ask them, “What do you want to know?” And so if they want to know about hiring, they enter in the names of the attorneys they just interviewed, and we’ll pop up every piece of data we have on them in line graphs, bar charts, you name it. Whatever they want to create the formula for visualizing it, we can produce it.
Steven Cherry: People pay a monthly fee?
Andrew Winship: It’s monthly. It’s yearly. It all depends on whether you’re general counsel, you’re a litigator, the size of the firm, kind of what level and plane you’re at. But the idea is, you have a log-in, your firm has a log-in, you log on to the system, we know all the cases you’re already involved in.
And so we’re already being able to feed you not just what you want to look at, but what you should be looking at, because we’re running background statistics, and we can say, “Hey, there’s a case you’re involved in with three standard deviations of difference in behavior between the judge you have and the judge you probably want. Here’s something that should be in front of your eyeballs that you weren’t thinking to look at.” So, the recommendation engine piece. Then there’s the other piece that’s more of a lawyer letter research tool, where the lawyer can get on and say, “I want to know about X,” and we kind of walk them through.
Steven Cherry: And they can also just do their own queries. It looks like a sort of standard Boolean relational database kind of thing.
Andrew Winship: Absolutely. There’s an advanced search tab. Lawyers are not necessarily the best at math. I don’t want to denigrate my profession, but for most of us, there was a joke in law school that if we were good at math we would have gone to business school. And so we make this process very simple, but there are many lawyers who absolutely understand Boolean searching and creating those advanced queries, and that option is always available.
Steven Cherry: Now, you got your start at something called St. Louis Startup Weekend. This seems like a pretty wild event. You walk in with some ideas, and you walk out on Sunday with a team and a company. Tell us about that weekend.
Andrew Winship: One of the most fantastic things that’s going on here in the U.S.—I know it’s happening worldwide, but there’s an American entrepreneurial spirit that is really amazing. Basically, complete strangers come together in one location. A group of them—the idea guys—get up in front and pitch their ideas, and you join the team you want, you work for 54 hours, and at the end, you pitch to a team of judges who are domain experts, kind of local celebrities or local entrepreneurs who have had some exits and some success.
And basically, they give a thumbs-up or a thumbs-down to your idea, and oftentimes, depending on how good the sponsorship was for the event, you can get some pretty cool prizes out of it. We were able to get some free rental space and free legal costs, so we were able to set up our corporation essentially for free, which is a huge help for any early company. But it really is a wild experience—so many friendships and networking and connections made over that weekend. I just went to meet people and network. Never in my wildest dreams was I going to join a company and be quitting my job a few months later.
Steven Cherry: And that was about a year ago. Now, there are some existing data companies in the legal space, and they’re pretty big. Lexis is owned by Reed Elsevier; Westlaw is owned by the Thomson Corp. Are you creating one of those start-ups that lives to be acquired?
Andrew Winship: Acquisition is certainly an opportunity. There’s a lot of exit strategies we could have. We don’t operate in direct competition with Lexis or Westlaw. We can do things like look at jury verdicts cross-referenced with flu outbreaks, like you talked about in the intro. Westlaw’s not providing that data. I like to tell people that Westlaw does a great job, and Lexis does a great job of telling you what the law is, whereas we kind of tell you about the practice of law and the unwritten rules. There are some players in the space, but they’re all small start-ups like us. There’s some out in California. There’s some out in New York.
Steven Cherry: As part of Internet week here in New York recently, I went to an event called Disrupt Law! The pitch was, “We’re tired of practicing powdered-wig law in an iPad world.” What are some other signs you’re seeing that powdered wigs are out and iPads are in?
Andrew Winship: I think it’s everywhere in the industry. There’s a narrative people like to give that the downturn in 2008 is what really started hurting the legal industry. That’s really not true. If you look at the data, the legal industry went through a paradigm shift starting in ’04, and you can see an inflection point at hiring at that point. You can see bankruptcies at big firms starting at that point. And it really is, I believe, the legal industry starting to finally understand the amount of tech that was out there that could be leveraged to make our industry more efficient.
And there’s been a massive push: eDiscovery is on every lawyer’s mind these days. There’s constantly new technology products that are coming out that are pushing billable hours away from the attorney and downstream into less-expensive tools, into automated tools, or even internationally. We’re seeing a lot of document review that used to be done by high-priced associates in New York and California move to the Midwest, and even move to India, and that’s happening all over the board.
Steven Cherry: Maybe you could just remind our listeners what eDiscovery is, and is there any overlap between that and what you’re doing?
Andrew Winship: In different phases of a lawsuit, the first thing that happens is a complaint is filed, and a phase that’s called “discovery” happens. And that really just means that both sides investigate the other one. So you’ll provide a request for the other side to provide documentation, communication, and answer specific questions about the lawsuit. Twenty years ago, that was a relatively small process because so much communication happened via phone and couldn’t be reproduced, or it was a few letters that went back and forth.
Today, there’s daily e-mails, there’s group texts, there’s transcription services, and so there’s so much more data about the communication that went back and forth. And so you’ll see terabytes of text data that’s turned over in this investigation phase, whether it’s for a product-liability suit about a faulty seat belt, or an asbestos lawsuit. There’s an enormous amount of data that corporations are now keeping and storing, and now lawsuits and lawyers want to look through that when trying to build their case, both defending their client and attacking the other side.
And so eDiscovery is basically a tool that outsources that parsing of all this electronic text data that now exists. There is some overlap with what we’re doing, but eDiscovery is really built to parse corporate text data, and our natural language-processing algorithms are to process that legal text data.
Steven Cherry: Very good. Well, Drew, it remains to be seen whether the days of Jarndyce v. Jarndyce are over, but if it is, we’ll have innovators like yourself to thank, and in the meantime, thanks for joining us today.
Andrew Winship: Thanks so much.
Steven Cherry: We’ve been speaking with St. Louis attorney Andrew Winship about how his start-up, Juristat, and others are bringing the practice of law into the 21st century.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
Photo: Alex Slobodkin/iStockphoto
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.
To Probe Further
The Troubled Life of Patent No. 6,456,841 Tracing the tortured legal trail of a simple smartphone patent
Marriage by Skype When advanced telecommunications is the only way to get you to the church on time
What Makes a Digital Document Real? How do we authenticate a birth certificate if it’s just ones and zeros?