Waymo today lifted the edge of the veil from the tremendous horde of data amassed by its fleet of self-driving cars, allowing academic researchers the chance to play with a bit of it.
The move is a way to “give back to the community,” says Drago Anguelov, Waymo’s head of research, “not an admission in any way that we have problems” that the company can’t solve on its own. Deadlines for self-driving cars have come and gone, and though Waymo still seems clearly in the lead, the advent of absolutely driverless cars still appears many years away.
Waymo’s competitors are free to register and look at the data set, so long as they don’t use it to build commercial self-driving cars.
It took Waymo months to select, annotate, and polish the files, which consist of 1,000 driving sequences, each one lasting 20 seconds, equivalent to 200,000 frames. The sequences were assembled from lidar, radar, and camera sensors onboard Waymo vehicles in 25 places, including busy cities, such as San Francisco, and suburban areas, such as Chandler, outside of Phoenix, Ariz.
Interleaved with all of that imagery are 12 million 3D labels, each one applying to a point cloud, the data set that’s generated by a lidar sensor. Waymo’s cars each carry five lidars—a main one that sweeps the entire 360 degrees and four short-range devices. There are also 1.2 million 2D labels for camera-generated images, which show only the visible parts of a scene and which mesh tightly with the 3D lidar clouds.
That close connection between lidar and camera data, including connections to other devices, such as radar, is known as sensor fusion. Waymo proudly asserts that it’s best at the job in large part because it alone has designed the entire package of hardware and software in-house.
Among the research problems that such a tight-knit data set may help to solve is “re-identification,” in which a continuously tracked object is recognized again after having been briefly obscured. If, say, a pedestrian walks past a tree and re-emerges on the other side, a system ought to be quick to recognize that it’s the same pedestrian.
Such continuity of attention links to the wider problem of moving from the mere identification of objects to predicting what those objects will do. A savvy human driver will easily tell that a pedestrian leaning against a fence post is probably not about to dart into the street, and that it’s the other way around for a pedestrian hovering at the edge of a curb. Computer algorithms aren’t as good at making the distinction.
For some years, a lot of academic work on self-driving vehicles centered on limited databases, such as the KITTI Vision Benchmark, assembled from driving records in and around Karlsruhe, Germany. Anguelov says that KITTI is too small and too time-constrained, and that Waymo’s better labeled, fuller narrative works better for the crafting of predictive models: “We present the world not in snippets but in coherent sequences.”
“This is just the first cut,” he says. “We want to publish benchmarks, organize competitions around them, and further extend the dataset.”
Philip E. Ross is a senior editor at IEEE Spectrum. His interests include transportation, energy storage, AI, and the economic aspects of technology. He has a master's degree in international affairs from Columbia University and another, in journalism, from the University of Michigan.