Waymo Offers a Peek Into the Huge Trove of Data Collected by Its Self-Driving Cars

Academic researchers can build models on labeled data gathered by Waymo’s fleet of self-driving cars

2 min read
Image of a Waymo car in front of the Google sign
Photo: iStock

Waymo today lifted the edge of the veil from the tremendous horde of data amassed by its fleet of self-driving cars, allowing academic researchers the chance to play with a bit of it.

The move is a way to “give back to the community,” says Drago Anguelov, Waymo’s head of research, “not an admission in any way that we have problems” that the company can’t solve on its own. Deadlines for self-driving cars have come and gone, and though Waymo still seems clearly in the lead, the advent of absolutely driverless cars still appears many years away.

Waymo’s competitors are free to register and look at the data set, so long as they don’t use it to build commercial self-driving cars. 

It took Waymo months to select, annotate, and polish the files, which consist of 1,000 driving sequences, each one lasting 20 seconds, equivalent to 200,000 frames. The sequences were assembled from lidar, radar, and camera sensors onboard Waymo vehicles in 25 places, including busy cities, such as San Francisco, and suburban areas, such as Chandler, outside of Phoenix, Ariz.

Interleaved with all of that imagery are 12 million 3D labels, each one applying to a point cloud, the data set that’s generated by a lidar sensor. Waymo’s cars each carry five lidars—a main one that sweeps the entire 360 degrees and four short-range devices. There are also 1.2 million 2D labels for camera-generated images, which show only the visible parts of a scene and which mesh tightly with the 3D lidar clouds.

That close connection between lidar and camera data, including connections to other devices, such as radar, is known as sensor fusion. Waymo proudly asserts that it’s best at the job in large part because it alone has designed the entire package of hardware and software in-house.

Among the research problems that such a tight-knit data set may help to solve is “re-identification,” in which a continuously tracked object is recognized again after having been briefly obscured. If, say, a pedestrian walks past a tree and re-emerges on the other side, a system ought to be quick to recognize that it’s the same pedestrian.

Such continuity of attention links to the wider problem of moving from the mere identification of objects to predicting what those objects will do. A savvy human driver will easily tell that a pedestrian leaning against a fence post is probably not about to dart into the street, and that it’s the other way around for a pedestrian hovering at the edge of a curb. Computer algorithms aren’t as good at making the distinction.

For some years, a lot of academic work on self-driving vehicles centered on limited databases, such as the KITTI Vision Benchmark, assembled from driving records in and around Karlsruhe, Germany. Anguelov says that KITTI is too small and too time-constrained, and that Waymo’s better labeled, fuller narrative works better for the crafting of predictive models: “We present the world not in snippets but in coherent sequences.”

“This is just the first cut,” he says. “We want to publish benchmarks, organize competitions around them, and further extend the dataset.”

The Conversation (0)

2021 Top 10 Tech Cars


The trend toward all-electric is accelerating 

1 min read
Photo: Rimac Automobili

The COVID-19 pandemic put the auto industry on its own lockdown in 2020. But the technological upheavals haven't slowed a bit.

The march toward electric propulsion, for example, continued unabated. Nine of our 10 Top Tech Cars this year are electrically powered, either in EV or gas-electric hybrid form. A few critical model introductions were delayed by the virus, including the debut of one of our boldface honorees: the long-awaited 2021 Lucid Air electric sedan. It's expected to hit the market in a few months. But the constellation of 2021's electric stars covers many categories and budgets, from the ultra-affordable, yet tech-stuffed Hyundai Elantra Hybrid to the US $2.4 million Rimac C Two hypercar.

Keep Reading ↓ Show less