Supercharging Patent Lawyers With AI

How Silicon Valley's Lex Machina is blending AI and data analytics to radically alter patent litigation

11 min read

Illustration: Mark Allen Miller

11LexMachina square illustration — Illustration: Mark Allen Miller

In a low-rise building in Menlo Park, Calif., just upstairs from a Mexican restaurant and a nail salon, a Stanford University spin-off is crunching data in ways that could shake the foundations of the legal profession.

Here, a small group of patent lawyers and computer scientists is applying the latest in machine learning and natural-language processing to reams of documents related to intellectual property lawsuits. The result is a massive statistical database on IP litigation like nothing the world has seen before. Which attorney has the best track record in defending against semiconductor-related infringement claims? Has a particular judge ruled on cases involving patent trolls, and if so, what was the outcome? Which companies tend to go to trial, and which settle out of court? By offering up such information, the database provides corporate lawyers, law firms, and government agencies with hard numbers that will reduce the guesswork, as well as the enormous expense, of patent litigation. In short, the company is building a “law machine," from which comes its name: Lex Machina.

“Law is horribly inefficient," says Mark Lemley, a professor at Stanford Law School, director of the Stanford Program in Law, Science & Technology, and cofounder of the company. “And in some ways, it is inefficient by design." After all, lawyers get paid by the hour, so inefficiency is rewarded, says Lemley. And some are rewarded richly: Top lawyers charge north of US $1000 per hour.

Lex Machina is in the vanguard of an emerging field known as legal analytics, according to Daniel Martin Katz, an associate professor of law at Michigan State University who writes the blog Computational Legal Studies and advocates overhauling the practice of law through technology. Practitioners of legal analytics statistically parse the practice of law in search of data that can be used to augment, or in some cases replace, the more qualitative judgment of human lawyers.

“There's been a quiet transition going on in the legal world," Katz says. And that transition will shake up the legal profession. “Human reasoning, at least some part of it, is going to be replaced by machine-based prediction." If Lex Machina succeeds, there will eventually be fewer frivolous lawsuits—and maybe fewer lawyers too.

We're the moneyball of IP litigation," says Josh Becker, Lex Machina's CEO. Bespectacled and unassuming, he looks more like a professor than a savvy Silicon Valley player. With law and MBA degrees from Stanford, he served as press secretary for a Pennsylvania congresswoman, worked at the Internet start-up EarthWeb/DICE and at Netscape, and founded a venture capital firm before turning his attention to Lex Machina.

Becker is also a huge baseball fan who's made a careful study of Michael Lewis's 2004 best-selling book, Moneyball, which tells how Oakland Athletics general manager Billy Beane used nontraditional statistics, called sabermetrics, to make judgments about players and game strategy. Looking at the numbers, for instance, Beane determined that two popular baseball plays—bunting and stealing bases—don't contribute significantly to a team's chance of winning, so he banned them. Such decisions based on sabermetrics contributed to the Athletics' making it to the playoffs in 2002 and 2003.

That approach is basically what Lex Machina is doing for law. But while baseball is known for its reliance on statistics, Becker says, law has long been a profession that is more art than science. “Some people went to law school to avoid data," he quips.

Lex Machina aims to change that. According to the company, its database covers more than 130 000 U.S. IP and antitrust cases dating back to the year 2000, including information on more than 1400 judges, 340 000 litigants, 100 000 attorneys, and 30 000 law firms. At present, it covers only the United States, but it may eventually include international patent cases as well.

With patent wars raging in every sector of the technology industry, IP litigation is big business and getting bigger all the time. The number of patent lawsuits in the United States skyrocketed between 2010 and 2012, from around 3200 filings to more than 5000, according to the Administrative Office of the United States Courts. One recent study, by James Bessen and Michael J. Meurer of the Boston University School of Law, found that defending against “nonpracticing entities"—sometimes called patent trolls—cost companies some $29 billion in 2011. Corporations are looking for a way to cut those costs.

Traditionally, a company that's been sued for patent infringement, or is thinking of suing because its own IP has been infringed, will hire top attorneys to pursue its case. Yet the process of deciding whether, how, and even where to file such a suit is often driven by gut instinct rather than facts. Even the best patent attorney has seen maybe tens of cases that are similar to the client's. “Humans are limited. People haven't seen 10 000 cases or 100 000 cases—a human can't hold that kind of information," Katz says.

But Lex Machina can. For an annual subscription fee of around $50 000, its customers get access to 13 years of U.S. IP litigation. Just like the sabermetrics described in Moneyball, Lex Machina's database can aid in the formulation of broad strategy as well as the selection of players, says Becker. The company's stats reveal, among other things, which attorneys do the best against a particular patent troll, how much time and money it typically takes to fight a troll versus settling out of court, and even which judge you'd want to hear your case. The data might tell a company being sued that its peers have been settling similar lawsuits early, thereby saving money. Even if a company believes it's in the right, says Becker, a prolonged legal battle and “fighting to the death" may not make good business sense.

So how does Lex Machina do what it does? It started with documents—millions of pages of legal documents that, in theory at least, are available to anyone, free of charge. In practice, though, before Lex Machina came along, there was no easy way to collectively consider that vast body of information. Figuring out how to extract relevant data from countless files and then building a comprehensive database took years of dedicated effort on the part of Lex Machina's small and eclectic team. Among its 18 employees are 6 people with law degrees, 6 with computer science degrees, and 1 who has both.

The company began as an academic research project called the Intellectual Property Litigation Clearinghouse, launched by Lemley in 2006 as a collaboration between Stanford's law school and its computer science department. As Lemley explained during an interview on the sunny terrace of Stanford Law's William H. Neukom Building, “The industry was having all these debates about how to fix the patent system, and none of them were based on actual evidence."