Why AI Makes It Hard to Prove That Self-Driving Cars Are Safe

Engineers weigh in on the pitfalls of machine learning and autonomous driving

4 min read
Interior view of the dashboard on a self-driving car driving on a road; the driver's hands are off the wheel
A member of the media test drives a Tesla Motors Model S equipped with Autopilot in Palo Alto, Calif., last fall.
Photo: David Paul Morris/Bloomberg/Getty Images

Car manufacturers will have difficulty demonstrating just how safe self-driving vehicles are because of what’s at the core of their smarts: machine learning. 

“You can’t just assume this stuff is going to work,” says Phillip Koopman, a computer scientist at Carnegie Mellon University who works in the automotive industry.

In 2014, a market research firm projected that the self-driving car market will be worth $87 billion by 2030. Several companies, including GoogleTesla, and Uber, are experimenting with computer-assisted or fully autonomous driving projects—with varying success because of the myriad technical obstacles that must be overcome.

Koopman is one of several researchers who believe that the nature of machine learning makes verifying that these autonomous vehicles will operate safely very challenging.

Traditionally, he says, engineers write computer code to meet requirements and then perform tests to check that it met them.

But with machine learning, which lets a computer grasp complexity—for example, processing images taken at different hours of the day, yet still identifying important objects in a scene like crosswalks and stop signs—the process is not so straightforward. According to Koopman, “The [difficult thing about] machine learning is that you don’t know how to write the requirements.”

Years ago, engineers realized that analyzing images from cameras is a problem that can’t be solved by traditional software. They turned to machine learning algorithms, which process examples to create mathematical models for solving specific tasks.

Engineers provide many human-annotated examples—say, what a stop sign is, and what isn’t a stop sign. An algorithm strips down the images, picking unique features and building a model. When a computer is subsequently presented with new images, it can run them through the trained model and get its predictions regarding which images contain a stop sign and which ones don’t.

“This is an inherent risk and failure mode of inductive learning,” Koopman says. If you look inside the model to see what it does, all you get are statistical numbers. It’s a black box. You don’t know exactly what it’s learning, he says.

To make things more concrete, imagine if you test drive your self-driving car and want it to learn how to avoid pedestrians. So you have people in orange safety shirts stand around and you let the car loose. It might be training to recognize hands, arms, and legs—or maybe it’s training to recognize an orange shirt.

Or, more subtly, imagine that you’ve conducted the training during the summer, and nobody wore a hat. And the first hat the self-driving car sees on the streets freaks it out.

“There’s an infinite number of things,” that the algorithm might be training on, he says.

Google researchers once tried identifying dumbbells with an artificial neural network, a common machine learning model that mimics the neurons in the brain and their connections. Surprisingly, the trained model could identify dumbbells in images only when an arm was attached.

Other problems with safety verification, Koopman says, include training and testing the algorithm too much on similar data; it’s like memorizing flash cards and regurgitating the information on an exam.

If Uber dropped its self-driving cars in a random city, he says, where it hasn’t exhaustively honed computer maps, then maybe they wouldn’t work as well as expected. There’s an easy fix: If you only train and only operate in downtown Pittsburgh (which Uber has mapped), then that could be okay, but it’s a limitation to be aware of.

There’s also the challenge of ensuring that small changes in what the system perceives—perhaps because of fog, dust, or mist—don’t affect what algorithms identify. Research conducted in 2013 found that changing individual pixels in an image, invisible to the unaided eye, can trick a machine learning algorithm into thinking a schoolbus is not a schoolbus.

“You would never put such [a machine learning] algorithm into a plane because then you cannot prove the system is correct,” says Matthieu Roy, a software dependability engineer at the National Center for Scientific Research in Toulouse, France, who has worked in both the automotive and avionics industries. If an airplane does not meet independent safety tests, it cannot take off or land, he says. 

Roy says it would be too difficult to test autonomous cars for all the scenarios they could experience (think of an explosion or a plane crashing right in front). “But you have to cope with all the risks that may arrive,” he says.

Alessia Knauss, a software engineering postdoc at the Chalmers University of Technology in Göteborg, Sweden, is working on a study to determine the best tests for autonomous vehicles. “It’s all so costly,” she says.

She’s currently interviewing auto companies to get their perspectives. She says that even if there were multiple sensors—such as in Google’s cars—that act as backups, each component has to be tested based on what it does, and so do all of the systems that make use of it.

“We’ll see how much we can contribute,” Knauss says.

Koopman wants automakers to demonstrate to an independent agency why they believe their systems are safe. “I’m not so keen to take their word for it,” he says.

In particular, he wants car companies to explain the features of the algorithms, the representativeness of the training and testing data for different scenarios, and, ultimately, why their simulations are safe for the environments the vehicle is supposed to work in. If an engineering team simulated driving a car 10 billion miles without any hiccups, although the car didn’t see everything, a company could explain that any other scenarios wouldn’t happen very often.

“Every other industry that does mission critical software has independent checks and balances,” he says.

Last month, the U.S. National Highway Traffic Safety Administration unveiled guidelines for autonomous cars, but they make independent safety testing optional.

Koopman says that with company deadlines and cost targets, sometimes safety corners can be cut, such as during the 1986 NASA Challenger accident, where ignoring risk led to a spacecraft exploding 73 seconds after liftoff and killing seven astronauts.

It’s possible to have independent safety checks without publicly disclosing how the algorithms work, he says. The aviation industry has engineering representatives who work inside aviation companies; it’s standard practice to have them sign nondisclosure agreements.

“I’m not telling them how to do it, but there should be some transparency,” says Koopman.

The Conversation (0)