The October 2022 issue of IEEE Spectrum is here!

Close bar

Tesla’s Autopilot Depends on a Deluge of Data

But can a fire-hose approach solve self-driving’s biggest problems?

5 min read
Close-up of the Autopilot screen in a Tesla

In 2019, Elon Musk stood up at a Tesla day devoted to automated driving and said, “Essentially everyone’s training the network all the time, is what it amounts to. Whether Autopilot’s on or off, the network is being trained.”

Tesla’s suite of assistive and semi-autonomous technologies, collectively known as Autopilot, is among the most widely deployed—and undeniably the most controversial—driver-assistance systems on the road today. While many drivers love it, using it for a combined total of more than 5 billion kilometers, the technology has been involved in hundreds of crashes, some of them fatal, and is currently the subject of a comprehensive investigation by the National Highway Traffic Safety Administration.

This second story—in IEEE Spectrum’s series of three on Tesla’s empire of data (story 1; story 3)—focuses on how Autopilot rests on a foundation of data harvested from the company’s own customers. Although the company’s approach has unparalleled scope and includes impressive technological innovations, it also faces particular challenges—not least of which is Musk’s decision to widely deploy the misleadingly named Full Self-Driving feature as a largely untested beta.

“Right now, automated vehicles are one to two magnitudes below human drivers in terms of safety performance.”
—Henry Liu, Mcity

Most companies working on automated driving rely on a small fleet of highly instrumented test vehicles, festooned with high-resolution cameras, radars, and laser-ranging lidar devices. Some of these have been estimated to generate 750 megabytes of sensor data every second, providing a rich seam of training data for neural networks and other machine-learning systems to improve their driving skills.

Such systems have now effectively solved the task of everyday driving, including for a multitude of road users, different weather conditions, and road types, says Henry Liu, director of Mcity, a public-private mobility research partnership at the University of Michigan.

“But right now, automated vehicles are one to two magnitudes below human drivers in terms of safety performance,” says Liu. “And that’s because current automated vehicles can’t handle the curse of rarity: low-frequency, long-tail, safety-critical events that they just don’t see enough to know how to handle.” Think of a deer suddenly springing into the road, or a slick of spilled fuel.

Tesla’s bold bet is that its own customers can provide the long tail of data needed to boost self-driving cars to superhuman levels of safety. Above and beyond their contractual obligations, many are happy to do so—seeing themselves as willing participants in the development of technology that they have been told will one day soon allow them to simply sit back and enjoy being driven by the car itself.

For a start, the routing information for every trip undertaken in a recent model Autopilot-equipped Tesla is shared with the company—see the the previous installment in this series. But Tesla’s data effort goes far beyond navigation.

In autonomypresentations over the past few years, Musk and Tesla’s then-head of AI, Andrej Karpathy, detailed the company’s approach, including its so-called Shadow Mode.

The back of a Tesla Model S seen in shadowPhilipp Mandler/Unsplash

In Shadow Mode, operating on Tesla vehicles since 2016, if the car’s Autopilot computer is not controlling the car, it is simulating the driving process in parallel with the human driver. When its own predictions do not match the driver’s behavior, this might trigger the recording of a short “snapshot” of the car’s cameras, speed, acceleration, and other parameters for later uploading to Tesla. Snapshots are also triggered when a Tesla crashes.

After the snapshots are uploaded, a team may review them to identify human actions that the system should try to imitate, and input them as training data for its neural networks. Or they may notice that the system is failing, for instance, to properly identify road signs obscured by trees.

In that case, engineers can train a detector designed specifically for this scenario and download it to some or all Tesla vehicles. “We can beam it down to the fleet, and we can ask the fleet to please apply this detector on top of everything else you’re doing,” said Karpathy in 2020. If that detector thinks it spots such a road sign, it will capture images from the car’s cameras for later uploading,

His team would quickly receive thousands of images, which they would use to iterate the detector, and eventually roll it out to all production vehicles. “I’m not exactly sure how you build out a data set like this without the fleet,” said Karpathy. An amateur Tesla hacker who tweets using the pseudonym Green told Spectrum that he identified over 900 Autopilot test campaigns, before the company stopped numbering them in 2019.

For all the promise of Tesla’s fleet learning, Autopilot has yet to prove that it can drive as safely as a human, let alone be trusted to operate a vehicle without supervision.

Liu is bullish on Tesla’s approach to leveraging its ever-growing consumer base. “I don’t think a small…fleet will ever be able to handle these [rare] situations,” he says. “But even with these shadow drivers—and if you deploy millions of these fleet vehicles, that’s a very, very large data collection—I don’t know whether Tesla is fully utilizing them because there’s no public information really available.”

One obstacle is the sheer cost. Karpathy admitted that having a large team to assess and label images and video was expensive and said that Tesla was working on detectors that can train themselves on video clips captured in Autopilot snapshots. In June, the company duly laid off 195 people working on data annotation at a Bay Area office.

While the Autopilot does seem to have improved over the years, with Tesla allowing its operation on more roads and in more situations, serious and fatal accidents are still occurring. These may or may not have purely technical causes. Certainly, some drivers seem to be overestimating the system’s capabilities or are either accidentally or deliberately failing to supervise it sufficiently.

Other experts are worried that Tesla’s approach has more fundamental flaws. “The vast majority of the world generally believes that you’re never going to get the same level of safety with a camera-only system that you will based on a system that includes lidar,” says Dr. Matthew Weed, senior director of product management at Luminar, a company that manufacturers advanced lidar systems.

He points out that Tesla’s Shadow Mode only captures a small fraction of each car’s driving time. “When it comes to safety, the whole thing is about…your unknown unknowns,” he says. “What are the things that I don’t even know about that will cause my system to fail? Those are really difficult to ascertain in a bulk fleet” that is down-selecting data.

For all the promise of Tesla’s fleet learning and the enthusiastic support of many of its customers, Autopilot has yet to prove that it can drive as safely as a human, let alone be trusted to operate a vehicle without supervision. And there are other difficulties looming. Andrej Karpathy left Tesla in mid-July, while the company continues to face the damaging possibility of NHTSA issuing a recall for Autopilot in the United States. This would be a terrible PR (and possibly economic) blow for the company but would likely not halt its harvesting of customer data to improve the system, nor prevent its continued deployment overseas.

Tesla’s use of fleet vehicle data to develop Autopilot echoes the user-fueled rise of Internet giants like Google, YouTube, and Facebook. The more its customers drive, so Musk’s story goes, the better the system performs.

But just as tech companies have had to come to terms with their complicated relationships with data, so Tesla is beginning to see a backlash. Why does the company charge US $12,000 for a so-called “full self-driving” capability that is utterly reliant on its customers’ data? How much control do drivers have over data extracted from their daily journeys? And what happens when other entities, from companies to the government, seek access to it? These are the themes for our third story.

This article appears in the October 2022 print issue as “The Radical Scope of Tesla’s Data Hoard.”

The Conversation (2)
Vaibhav Sunder06 Sep, 2022

A learning module should be allowed to be on sale by a neutral company like Google Maps so that it can learn more data. I would buy a small such device, or on my phone when on a drive, even on a Uber or Ola. If it can learn by South American or my native Indian roads, it would be a big thing, due to congestion and use of maneuvering on such roads.

FB TS05 Aug, 2022

IMHO automated driving problems needs to be handled/solved using Game Theory & then it can be really/actually superior to any human driving!

Imagine that the car is a player trying to reach its goal/destination & all other moving vehicles/pedestrians/animals are opponent players!

The car needs to consider all its possible actions & make a choice for each game move!

& of course, all opponent players will make their own moves in response!

The game goal of the car will be reaching its target locationl w/o any collision/accident!

Lab Revisits the Task of Putting Common Sense in AI

New nonprofit Basis hopes to model human reasoning to inform science and public policy

5 min read
ai hand and human hand touching pointer fingers

The field of artificial intelligence has embraced deep learning—in which algorithms find patterns in big data sets—after moving on from earlier systems that more explicitly modeled human reasoning. But deep learning has its flaws: AI models often show a lack of common sense, for example. A new nonprofit, Basis, hopes to build software tools that advance the earlier method of modeling human reasoning, and then apply that method toward pressing problems in scientific discovery and public policy.

To date, Basis has received a government grant and a donation of a few million dollars. Advisors include Rui Costa, a neuroscientist who heads the Allen Institute in Seattle, and Anthony Philippakis, the chief data officer of the Broad Institute in Cambridge, Mass. In July, over tacos at the International Conference on Machine Intelligence, I spoke with Zenna Tavares, a Basis cofounder, and Sam Witty, a Basis research scientist, about human intelligence, problems with academia, and trash collection. The following transcript has been edited for brevity and clarity.

Keep Reading ↓Show less

This Idea Wasn’t All Wet: The Sensing Water-Saving Showerhead Debuts

An engineer’s dinner-table invention is finally a consumer product

4 min read
A mounted and running showerhead that says oasense and has a blue light on it.

For Evan Schneider, the family dinner table is a good place for invention. “I’m always, ‘Wouldn’t it be cool if this or that,’” he says, “and people would humor me.”

In 2012, with California in the midst of a severe drought, Schneider, then a mechanical engineering graduate student at Stanford University, once again tossed out a “cool idea.” He imagined a showerhead that would sense when the person showering moved out from under the stream of water. The showerhead would then automatically turn the water off, turning it back on again when the person moved back into range. With such a device, he thought, people could enjoy a long shower without wasting water.

Keep Reading ↓Show less

Evolution of In-Vehicle Networks

Download this free poster to learn how developments in Advanced Driver-Assistance Systems (ADAS) are creating a new approach to In-Vehicle Network design

1 min read
Rohde & Schwarz

Developments in Advanced Driver-Assistance Systems (ADAS) are creating a new approach to In-Vehicle Network (IVN) architecture design. With today's vehicles containing at least a hundred ECUs, the current distributed network architecture has reached the limit of its capabilities. The automotive industry is now focusing on a domain or zonal controller architecture to simplify network design, reduce weight & cost and maximize performance.

Download this free poster now!

Keep Reading ↓Show less