The Trouble With Trusting AI to Interpret Police Body-Cam Video

Axon is promising its AI will be able to describe events recorded in body-cam video, but we’re skeptical

Advertisement

On 17 July 2014, a group of New York City police officers approached 43-year-old Eric Garner on a Staten Island sidewalk and attempted to arrest him—for allegedly selling cigarettes illegally. When Garner pulled free, one officer wrapped an arm around Garner’s neck, forced him to the ground, and pressed his face into the sidewalk. Garner, who had asthma and heart disease, repeatedly pleaded, “I can’t breathe,” before passing out. Unconscious, he was transported to a hospital, where he was pronounced dead an hour later. The medical examiner later ruled Garner’s death a homicide.

This tragedy drew national attention thanks to a cellphone video, which revealed in shocking detail the grossly excessive use of force that Garner was subjected to at the hands of police. Garner’s killing, and those of other unarmed black citizens by police, sparked protests throughout the United States.

Since then, reform-minded politicians in the United States and elsewhere have championed the use of police body cameras, seeing them as a tool to hold police accountable for their actions. Police officers themselves are often enthusiastic about wearing body cameras, which can sometimes be used to clear them of suspicion when false allegations are made.

The leading maker of police body cams is Axon Enterprise, which most people still know by its old name, Taser International, and its signature product, the Taser stun gun. In 2015, sensing a growing market, Taser created a division to sell its Axon body cameras to police, and last year it rebranded the whole company under the Axon banner to signal its new focus on the collection and management of police records and evidence, especially body-camera video.

Axon is now providing free body cameras to any interested police department, along with a no-cost one-year subscription to its Evidence.com data-storage service. The company’s hope, of course, is that such incentives will win it a healthy revenue stream down the road.

While the company’s shift from stun guns to body-cam video will no doubt interest investors, what caught our attention was not the new business model. The two of us are technologists—Greene studies the influence of technology on society as a professor at the University of Maryland’s College of Information Studies, and Patterson is a specialist in computer vision and chief scientist at a New York City startup called Trash. We’ve been following Axon’s stated aspirations of building artificial intelligence (AI) systems to process, label, and interpret the anticipated deluge of body-cam video, and we have serious misgivings about the reasonableness and wisdom of that strategy.

Axon claims that it will train its AI system using its existing trove of body-camera data—currently standing at 30 petabytes of video, collected by 200,000 officers. The system will then be able to redact the video to protect people’s privacy, interpret and describe in written form the recorded events, and eventually help generate police reports from those descriptions. Such automated tools would free police officers to perform more valuable tasks, and they would create a searchable database of police interactions with the public. Axon has also filed a patent for real-time face recognition, which a number of its competitors are also actively developing for police body cameras.

Almost two years ago, Axon founder and CEO Rick Smith predicted in an investor earnings call that the company would roll out its AI-assisted video systems in 2018, but so far the capabilities of Axon’s Evidence.com product are more mundane. It can automatically blur or black out faces or otherwise redact personally identifiable information (after someone marks who or what is to be so masked) and tag videos and other evidence with information recorded in a police dispatch system, such as the case number, location, and type of encounter. This is far short of what’s been described in Axon’s sales pitches. At the 2018 Axon Accelerate Conference in June, for example, Smith said, “One day we will be able to have AI work on [body-camera] video and in-car video to create a first draft of a report that an officer can go into and edit.”

Eliminating tedious paperwork by automatically classifying who is doing what, where, and with whom in body-cam footage is certainly an attractive idea. But we, along with other outside observers, caution that dangers lurk. Many of the AI capabilities that Axon proposes to deploy aren’t mature enough. And even if they were, there would be no way to tell if the technology is free from bias and other worrisome issues, because the company’s software is proprietary and unavailable for independent review.

For these reasons, the application of machine learning for the classification of police body-camera video has become a major flash point in the broader debate over whether proprietary software should be incorporated into the criminal-justice system. Here we will lay out our concerns and also offer suggestions for how to manage a technology that, like it or not, is on the horizon.

At least one capability of Axon’s video-management system is both doable and probably very helpful—obscuring the faces of people in body-cam video, so that they can’t be identified if that video is made public. Mojtaba Solgi, director of AI and machine learning at Axon, says that he and his colleagues are also working on the automated transcription of voices recorded in these videos. We have no issue with the general desire to automate such mundane and often straightforward tasks.

But having a machine review and classify the goings-on depicted in such footage is a much more difficult undertaking. Sure, simple images are easy to classify. A computer has no problem these days distinguishing a dog from an airplane or a tree. But even the best AI is unable to analyze a complex image or scene with anything like the sophistication and nuance that a person brings to the job. While researchers have made huge strides in recent years, the most advanced AI systems are still relatively primitive when it comes to interpreting what’s going on in a complicated image or video segment.

Jitendra Malik of the University of California, Berkeley, and 11 other machine-learning experts recently published results from a study that uses a new, meticulously annotated video data set to test the performance of a bleeding-edge deep-learning system. Deep learning involves the application of artificial neural networks—computer programs that are constructed in a way that is crudely analogous to the networks of neurons in a human brain.

Malik and his colleagues wanted to train a network to identify where actions were most likely taking place in video footage, and to do so they combined features of two popular neural networks: I3D [PDF], a descendant of the widely used Inception Network, and Faster-RCNN. The actions their network had to classify were simple: standing, sitting, walking, jumping, eating, sleeping, reading, talking, smoking, riding a bike, or carrying an object, to name just a few of the 80 activities it considered. There was no attempt to interpret more complicated actions (say, shoplifting or applying a choke hold). And the video segments came from films and television shows, so they presumably show the action clearly, with good lighting and appropriate camera angles.

Even so, the deep-learning system tested on the new data set often faltered—identifying, for example, people as smoking when they were, in fact, just holding a phone to their ear. “The model often struggles to discriminate fine-grained details,” is how the authors characterized such basic failures. By that measure, whether a person is reaching for a wallet or a gun would also be considered a “fine-grained detail,” one that today’s most advanced video-analysis AI could easily misinterpret.

Even Axon’s more modest proposals for its system will, in our estimation, struggle to make the leap from laboratory to actual policing. For example, Axon says that once somebody in a video is tagged, its system can maintain that tag as the subject moves through multiple frames of the video.

The most advanced computer-vision systems are able to track figures in this way, but the examples of successful tracking described in the research literature rely on high-quality video footage of people moving in relatively sterile environments. Nobody has yet demonstrated such tracking using video that was captured in poor lighting by moving cameras with erratic fields of view.

Automated video interpretation is a tricky problem in any domain. But in policing, the demands are positively enormous, and the sorts of errors that AI systems tend to make could have dire consequences. What’s more, these errors could occur frequently and be difficult to detect.

Problems can arise, for example, when an automated image-classification system learns its function from messy, incomplete, or biased data. A famous example of this danger was exposed in 2015 by programmer Jacky Alciné, who is black. Alciné found that Google’s Photos app, which uses machine learning to categorize content, labeled a picture of him and a black friend as “Gorillas.”

AI experts can fix these sorts of problems by vigilantly looking for them and iteratively improving their systems. This is what AI researchers at Caltech and the University of California, San Diego, did when they built the Merlin Bird ID app. They collected a huge set of bird images and then crowdsourced the labeling of species. Using those results, they trained their first species-classification AI. But the classifier’s performance was just lousy.

The computer scientists then contacted some real bird experts—from the Cornell Lab of Ornithology—to figure out what was going on. The ornithologists realized straightaway that many birds in the training data set had been misclassified because of the crowdsourced workers’ limited ornithological knowledge.

After painstaking efforts to fix the many errors in the training set, writing new and improved instructions for the crowd workers, and repeating the entire process several times, the designers of Merlin were finally able to release an app that worked reasonably well. And they continue to improve their AI system by drawing on user data and following up on corrections from experts in ornithology.

Dextro, the New York City–based computer vision startup that was acquired by Axon last year, described a similar approach used with its video-recognition system. The company debugged its AI creations by continuously identifying false positives and false negatives, retraining its neural-network models, and evaluating how the system changed in response. We can hope that these researchers continue this practice as part of Axon. But since Dextro’s acquisition, Axon’s AI group has published little about the technology it is applying to police body-cam video.

At the European Conference on Computer Vision this past September, in Munich, AI experts from Axon did describe how their technology fared in an open video-understanding competition, where it was highly ranked. That competition analyzed YouTube videos, though, so the relevance of these results to police body-cam video remains unclear. And Axon has shared much less about its AI capabilities than has Chinese computer-vision startup Megvii, which regularly submits its image-analysis system to public competitions—and routinely wins.

AI developers often identify where and how their systems break down by evaluating their performance using certain well-established criteria. This is why AI research, particularly in computer vision, leans heavily on domain experts (as happened with the Merlin app). A shared set of benchmarks and a set of open contests and workshops where any interested party can participate also foster an environment where problems with an AI system can readily surface.

But as Elizabeth Joh, a legal scholar at the University of California, Davis, argues [PDF], this process is short-circuited when private surveillance-technology companies assert trade secrecy privileges over their software. Obviously, police departments have to procure equipment and services from the private sector. But AI of the sort that Axon is developing is fundamentally different from copier paper or cleaning services or even ordinary computer software. The technology itself threatens to change police judgments and actions.

Imagine, to invent a hypothetical example, that a video-interpretation AI categorized women wearing burqas as people wearing masks. Prompted by this classification, police might then unconsciously start to treat such women with greater suspicion—perhaps even to the point of provoking those women to be less cooperative. And that change, which would be recorded by body cams, could then influence the training sets Axon uses to develop future AI tools, cementing in a prejudice that arose initially just from a spurious artifact of the software.

Without independent experts in the loop to scrutinize these automated interpretations, this circular system can rapidly degenerate into an AI that produces biased or otherwise unreliable results.

It’s too soon to know whether this will happen to Axon’s video-management system. But ProPublica uncovered just such problems in 2016 with another classification tool deployed in the criminal-justice system—Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS. Judges use this pretrial risk-assessment algorithm to make decisions about an arrestee’s eligibility for probation or other alternatives to incarceration.

ProPublica recorded more than 7,000 risk scores produced by COMPAS about arrestees in Broward County, Fla., and compared these predictions with arrest records for the subsequent two years. It found a racially biased pattern of erroneous predictions that persisted even after controlling for criminal history and the type of crime committed. NorthPointe, the company selling COMPAS, disputes ProPublica’s analysis but has not released technical details about its software for all to inspect and scrutinize. A Wisconsin man named Eric Loomis challenged COMPAS’s black-box calculation in court, claiming that its use in sentencing him to six years in prison violated his right to due process. But the State Supreme Court sided against him and with NorthPointe, blocking scrutiny of COMPAS on the basis that its precise operation is a trade secret.

Statistician Kristen Lum and political scientist William Isaac found similar problems with a predictive-policing system called PredPol. They showed that this system produces outcomes that are often biased against black people because of biased training data.

Given what we know about machine learning’s vulnerability to biased training data and the performance of cutting-edge tools in the lab, we suspect that any automated video-classification service Axon develops in the next few years will have serious weaknesses. But outside analysts probably won’t be able to tell for sure because the software is proprietary.

Police reports are public records, though, so you might imagine that any obvious errors in Axon’s automated records-management system would come to light quickly. But that may not happen. A patchwork of data-protection laws means there is no clear national standard for who can access body-camera footage, when, or how—to the point that the Reporters’ Committee for the Freedom of the Press calls body-camera footage “the Wild West of open records requests.” Under these conditions, a technical problem, even one that affects multiple police departments, will be hard to spot.

Wouldn’t police officers themselves notice any problems with Axon’s AI in the normal course of their work? Perhaps. Then again, they might suffer from “automation bias,” a tendency for people to accept a computer’s judgments over their own because of the perceived objectivity of machines.

So even if Axon’s AI can automate the description of activities recorded in body-cam video and the generation of police reports from it—and we believe that’s currently impossible—issues of fairness, accountability, and transparency would remain.

This is hardly the first time that new technologies have come along that demand, in the name of effectiveness and safety, to be independently tested and monitored throughout their development and even after they reach the market. Precedents for handling these situations exist.

The U.S. Food and Drug Administration could be one model. The FDA was established in the early 20th century in response to toxic or mislabeled food products and pharmaceuticals. Part of the agency’s mandate is to prevent drug companies from profiting by selling the equivalent of snake oil. In that same vein, AI vendors that sell products and services that affect people’s health, safety, and liberty could be required to share their code with a new regulatory agency. The agency’s experts could then test new AI systems against established benchmarks, as in attorney Matthew Scherer’s proposal for the Artificial Intelligence Development Act [PDF]. This requirement might reduce the company’s profitability in the short term, but it would help ensure that the public is protected and help legitimize these new technologies. And it would probably improve the performance of these systems by institutionalizing the sort of collaborative testing and refinement that has powered the recent AI renaissance.

State and local governments need not wait, though, for some new federal regulatory regime, which is unlikely to emerge anytime soon. Seattle has provided a blueprint with its recent Surveillance Technology Acquisition legislation, which requires city departments to conduct community outreach and seek city council approval prior to procuring new surveillance software. Some local governments might adopt the position of the Movement for Black Lives, which seeks to end the use of police surveillance technologies entirely as part of a broad policy platform. Other communities, where people see the benefit of body cameras, might enact laws that require independent quality-assurance testing for any AI system used in policing.

To its credit, Axon has assembled the AI and Policing Technology Ethics Board to grapple with some of these thorny issues. The board includes Miles Brundage of the OpenAI initiative, an expert in AI’s societal implications, and Walter McNeil, sheriff of Leon County, Fla., an expert in community policing. Also on the board are University of Washington computer-vision researcher Ali Farhadi and Electronic Frontier Foundation roboticist Jeremy Gillula.

Four board members shared their insights with us, with Brundage aptly noting that “given the often limited IT expertise of law-enforcement agencies, what Axon does or doesn’t make available will have an impact on what happens, so we need to think through what safeguards are possible to impose from Axon’s position.” We agree. Yet according to the board members who responded to us, Axon’s ethics board has met only once to discuss ground rules and has no special insight into the company’s research and development efforts.

This past April, 42 civil rights and technology organizations—such as the NAACP and the AI Now Institute—signed an open letter urging Axon’s AI ethics board to get the company to adopt stringent ethical reviews for all of its products and to refrain from deploying what these organizations deemed dangerous and untested technologies. Chief among them was real-time facial recognition. The company claims that it is not currently developing facial recognition technology for its body cameras, but The Wall Street Journal has reported that Axon has, in fact, been seeking just such technology.

Other companies—like Megvii—are absolutely building out this capability and are enabling police departments in China and other Asian countries to add real-time facial-recognition capability to police body cams. And Motorola Solutions hopes soon to bring such capabilities to the United States.

Whether the West embraces real-time facial recognition on police body cameras is yet to be seen. In our view, that decision should be the subject of open public debate. But there can be no reasonable discussion if the capabilities and pitfalls of the technology are not well known to outsiders.

Whether the domain is faces or birds, AI currently benefits from a culture of openness, which drives exchanges among domain experts, computer scientists, and users. The regulatory requirements we are proposing would help ensure that the AI used by the criminal-justice system would be similarly open to scrutiny, so that it can be demonstrated to be reliable and free from bias. These rules may introduce some friction into the process of commercialization. But such is the nature of due process—and of solid engineering.

This article appears in the December 2018 print issue as “Can We Trust Computers With Body-Cam Video?”

About the Authors

Daniel Greene is a professor at the University of Maryland’s College of Information Studies. Genevieve Patterson is chief scientist for the computer-vision startup Trash, in New York City.

Advertisement