A New Way to Find Bugs in Self-Driving AI Could Save Lives

A debugging method for deep learning AI pits neural networks against each other to find errors

4 min read
Illustration of an eye with code, looking at an autonomous car below
Illustrations: iStockphoto

Most software bugs won’t kill you. A possibly lethal exception could be the error that leads a self-driving car’s AI to make the wrong decision at the wrong time. That is why researchers developed a bug-hunting method that can systematically expose bad decision-making by the deep learning algorithms deployed in online services and autonomous vehicles.

The new DeepXplore method [PDF] uses at least three neural networks—the basic architecture of deep learning algorithms—to act as “cross-referencing oracles” in checking each other’s accuracy. Researchers at Columbia University and Lehigh University designed DeepXplore to solve an optimization problem in which they looked to strike the best balance between two objectives: maximizing the number of neurons activated within neural networks, and triggering as many conflicting decisions as possible among different neural networks. By assuming that the majority of neural networks will generally make the right decision, DeepXplore automatically retrains the neural network that made the lone dissenting decision to follow the example of the majority in a given scenario.

“This is a differential testing framework that can find thousands of errors in self-driving systems and in similar neural network systems,” says Yinzhi Cao, assistant professor of computer science at Lehigh University in Bethlehem, Pa.

Cao and his colleagues on the DeepXplore team recently won best paper after presenting their research at the 2017 Symposium on Operating Systems Principles (SOSP) held in Shanghai, China, from 28-31 Oct. Their win may signal a growing recognition of the need for debugging tools such as DeepXplore in deep learning AI.

Typically, deep learning algorithms become better at certain tasks by filtering huge amounts of training data that humans have labeled with the correct answers. That has enabled such algorithms to achieve accuracies of well over 90 percent on certain test datasets that involve tasks such as identifying the correct human faces in Facebook photos or choosing the correct phrase in a Google translation between, say, Chinese and English. In these cases, it’s not the end of the world if a friend occasionally gets misidentified or if a certain esoteric phrase gets translated incorrectly.

Example photos of errors in self-driving car software. The left photo shows an arrow correctly turning left. The right photo, a slightly darker version, has an arrow indicating the car turns right, into the guard rail.This example from DeepXplore shows an error found in Nvidia's DAVE-2 self-driving car software, which would cause the car to crash into a guardrail due to a darker version of the image.Images: Columbia University/Lehigh University/ACM

But the consequences of mistakes rise sharply once tech companies begin using deep learning algorithms in applications such as one where a two-ton machine is moving at highway speeds. A wrong decision by a self-driving AI could lead to the car crashing into a guard rail, colliding with another vehicle or running down pedestrians and cyclists. Government regulators will want to know for sure that self-driving cars can meet a certain safety level—and random test datasets may not uncover all those rare “corner cases” that could lead an algorithm to make a catastrophic mistake.

“I think this push toward secure and reliable AI kind of fits in nicely with explainable AI,” says Suman Jana, an assistant professor of computer science at Columbia University in New York City. “Transparency, explanation and robustness all have to be improved a lot in machine learning systems before these systems can start working together with human beings or start running on roads.”

Jana and Cao come from a group of researchers who share backgrounds in software security and debugging. In their world, even software that is 99-percent error free could still be vulnerable if malicious hackers can exploit that one lone bug in the system. That has made them far less tolerant of errors than many deep learning researchers who see mistakes as a natural part of the training process. It also made them fairly ideal candidates to figure out a new and more comprehensive approach for debugging deep learning.

Until now, debugging of the neural networks in self-driving cars has involved fairly tedious or random methods. One random testing approach involves human researchers manually creating test images and feeding those into the networks until they triggered a wrong decision. A second approach, called adversarial testing, can automatically create a sequence of test images by slightly tweaking one particular image until it trips up the neural network.

DeepXplore took a different approach by automatically creating test images most likely to cause three or more neural networks to make conflicting decisions. For example, DeepXplore might look for just the right amount of lighting in a given image that could lead two neural networks to identify a vehicle as a car while a third neural network identifies it as a face.

At the same time, DeepXplore also aimed to maximize neuron coverage in its testing by activating the maximum number of neurons and different neural network pathways. Such neuron coverage is based on a similar concept in traditional software testing called code coverage, Cao explains. This process was able to activate 100 percent of network neurons, or about 30 percent more on average than either the random or adversarial testing methods previously used in deep learning algorithms. 

Testing with 15 state-of-the-art neural networks looking at five different public datasets showed how DeepXplore could find thousands of previously undiscovered errors in a wide variety of deep learning applications. The test datasets included scenarios for self-driving car AI, automatic object recognition in online images, and automatic detection of malware masquerading as ordinary software.

DeepXplore cannot yet guarantee that it has found every single possible bug in a system, but it’s far more comprehensive in testing large-scale neural networks than previous methods, Jana says. By comparison, a Stanford University team has taken an almost opposite approach by showing how to guarantee a small cluster of neurons is free of errors. Neither is a complete solution, but both represent promising and crucial steps toward debugging the future of deep learning.

The Conversation (0)

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less