In January, Waymo posted a tweet breaking down what “autonomy” means for the Waymo Driver, which is how the company refers to its autonomous driving system. The video in the Tweet points out that Level 1, 2, and 3 autonomy are not “fully autonomous” because a human driver might be needed. Sounds good. The Waymo Driver operates at Level 4 autonomy, meaning, Waymo says, that “no human driver is needed in our defined operational conditions.” This, Waymo continues, represents “fully autonomous driving technology,” with the Waymo Driver being “fully independent from a human driver.”
Using the term “full autonomy” in the context of autonomous vehicles can be tricky. Depending on your perspective, a vehicle with Level 4 autonomy fundamentally cannot be called “fully” autonomous, because it’s autonomous in some situations and not others, which is where the defined operational conditions bit comes in. The folks behind these levels of autonomy, SAE International, are comfortable calling vehicles with both Level 4 and Level 5 autonomy “fully autonomous,” but from a robotics perspective, the autonomy of the Waymo Driver is a little more nuanced.
While humans may not be directly in the loop with Waymo’s vehicles, there’s a team of them on remote standby to provide high-level guidance if a vehicle finds itself in a novel or ambiguous situation that it isn’t confident about handling on its own. These situations won’t require a human to take over the operation of the vehicle, but they can include things like construction zones, unexpected road closures, or a police officer directing traffic with hand signals— situations a human might be able to interpret at a glance, but that autonomous systems notoriously find difficult.
There’s nothing wrong with the approach of having humans available like this, except that it raises the question of whether a Level 4 autonomous system should really be called fully autonomous and fully independent from a human driver if it sometimes finds itself in situations where it may decide to ask a remote human for guidance. It may seem pedantic, but having a clear understanding of what autonomous systems can and cannot do is very important, especially when such topics are becoming more and more relevant to people who may not have much of a background in robotics or autonomy. This is what prompted Waymo’s tweet, and Waymo now has a whole public education initiative called Let’s Talk Autonomous Driving that’s intended to clearly communicate what autonomous driving is and how it works.
In this same spirit, I spoke with Nathaniel Fairfield, who leads the behavior team at Waymo, to get a more detailed understanding of what Waymo actually means when it calls the Waymo Driver fully autonomous.
IEEE Spectrum: Can you tell us a little bit about your background, and what your current role is at Waymo?
Nathaniel Fairfield: I’m currently a Distinguished Software Engineer at Waymo looking after what we call “behavior,” which is the decision-making part of the onboard software, including behavior prediction, planning, routing, fleet response, and control. I’ve been with the team since we were founded as the Google self-driving car project back in 2009, and my background is in robotics. Before Waymo I was at the Carnegie Mellon University Robotics Institute (where I received my Ph.D. and Masters) working on robots that could map complex 3D environments (ex: flooded cenotes in Mexico) and before that, I worked at a company called Bluefin Robotics building robots to map the ocean floor.
How does Waymo define full autonomy?
When we think about defining full autonomy at Waymo, the question is whether the system is designed to independently perform the entire dynamic driving task in all conditions in our operational design domain (ODD) without the need to rely on human intervention, or whether it requires a human to intervene and take control in such situations to keep things safe. The former would be full autonomy, and the latter would not be. The delta between the two is the difference between the L4 system we’re developing at Waymo (the Waymo Driver) which is responsible for executing the entire dynamic driving task, and L2 or L3 systems.
What are the specific operational conditions under which Waymo’s vehicles cannot operate autonomously?
Our current ODD in Phoenix, where we have our fully autonomous service Waymo One, is around 130 km2 (larger than the city of San Francisco). This area is broad enough to cover everyday driving, which includes different roadway types, various maneuvers, speed ranges, all times of day, and so on. Our ODD is always evolving as our technology continues to advance.
Just like a competent human driver, the Waymo Driver is designed so that it will not operate outside of its approved ODD. The Waymo Driver is designed to automatically detect weather or road conditions that would affect safe driving within our ODD and return to base or come to a safe stop (i.e. achieve a “minimal risk condition”) until conditions improve.
If Waymo’s vehicles encounter a novel situation, they ask a remote human supervisor for assistance with decision making. Can you explain how that process works?
Imagine you’re out driving and you come up to a “road closed” sign ahead. You may pause for a bit as you look for a “Detour” sign to show you how to get around it or if you don’t see that, start preparing to turn around from that road and create your own detour or new route. The Waymo Driver does the same thing as it evaluates how to plot the best path forward. In a case like this where the road is fully blocked, it can call on our Fleet Response specialists to provide advice on what route might be better or more efficient and then take that input, combine it with the information it has from the onboard map and what it’s seeing in real time via the sensors, and choose the best way to proceed.
This example shows the a few basic properties of all our fleet response interactions:
- The remote humans are not teleoperating the cars
- The Waymo Driver is not asking for help to perceive the surrounding environment; it can already do that. It’s asking for advice on more strategic planning questions based on what it’s already perceived.
- The Waymo Driver is always responsible for being safe
- Human responses can be very helpful, but are not essential for safe driving
What are some examples of situations or decision points where the Waymo Driver may not be able to proceed without input from a human?
In addition to construction, another example would be interpreting hand gestures. While that’s something we’ve improved a lot on over the last few years, it’s a common scenario the Waymo Driver likes to call on Fleet Response for at times. The Waymo Driver can perceive that someone may be using hand signals, such as another road user waving their hands, and then it will call on Fleet Response to confirm what the gesture appears to be signaling and use that input to make a decision about when and how to proceed.
This is completely dynamic and depends on the specific scenario; not "all construction zones" or "all novel situations" will the Waymo Driver engage with Fleet Response. There are some dead ends or construction zones, for example, where the Waymo Driver may not need to call on Fleet Response at all. Those are just examples of some common scenarios we see Fleet Response utilized for—cases where the Waymo Driver may call on Fleet Response, but does not have to.
So the "driving task" is sometimes separate from "strategic planning," which may include navigating through situations where a human is giving directions through hand signals, busy construction zones, full road closures, and things like that. And remote humans may at times be required to assist the Waymo Driver with strategic planning decisions. Am I understanding this correctly?
Zooming out a bit (this may get a little philosophical): are tasks made up of layers of behaviors of increasing levels of sophistication (so as to be able to cover every eventuality), or is it possible to carve off certain domains where only certain behaviors are necessary, and call that the task? A simplistic example would be tying your shoelaces. Does it include what I do most every day: putting on the shoe, tying the knot? Or does it also include dealing with a nasty knot that my son put in the laces? Or include patching the laces if they break? Or replacing if the break is in a bad place? Or finding new laces if I have to replace the lace? Or going to the store if I need to buy a new lace?
If it's the first case, even humans aren't really individually autonomous, because we rely on other individuals for assistance (changing a tire), public works (installing a traffic light), and social decision-making (traffic management of a small-town July-4th parade). If it's the second case, then there is an endless discussion to be had about exactly where to draw the lines. So in some sense, it's arbitrary, and we can agree to disagree, but what is the fun of that? I would argue that there are certain "useful" distinctions to draw—where there are certain sets of capabilities that allow an agent to do something meaningful.
And to clarify—this isn’t just Waymo’s perspective, it’s actually how SAE makes these distinctions. SAE essentially defines the dynamic driving task (DDT, or what the Waymo Driver is responsible for) as involving the tactical and operational functions required to operate a vehicle, which are separate from strategic functions.
EDITOR’S NOTE: According to SAE, the dynamic driving task includes the operational (steering, braking, accelerating, monitoring the vehicle and roadway) and tactical (responding to events, determining when to change lanes, turn, use signals, etc.) aspects of the driving task, but not the strategic (determining destinations and waypoints) aspect of the driving task. SAE’s definition of Level 4 autonomy involves the driving mode-specific performance by an automated driving system of all aspects of the dynamic driving task, even if a human driver does not respond appropriately to a request to intervene.
What is the disengagement rate for Waymo’s vehicles when they are operating with passengers? What are the most frequent causes of disengagements?
“Disengagement” usually refers to when a vehicle operator in the car switches the mode from autonomous to manual. With our fully autonomous service, we don’t have vehicle operators in the car so there’re technically no “disengagements” the way the term is generally understood. We do have a roadside assistance team who can assist the vehicle (and switch it over to manual control, if appropriate) while on the road but we don’t have metrics to share on those interactions.
But, the idea that these disengagement rates should be deducted from autonomy and that anything that ever gets stuck isn't "fully autonomous" is flawed. Under that definition, human drivers aren't "fully" autonomous! It would be sort of silly to say that "Nathaniel is 99.999% autonomous because he had to call a tow truck that one time."
I agree that it would be silly to say that Nathaniel is only 99.999% autonomous because he had to call a tow truck, but that's because most people don’t consider that to be part of the driving task—I think it might be less silly to say that Nathaniel is only 99.999% autonomous if he sometimes can't drive through construction zones or can't reroute himself if he encounters a road closure.
When you get into a taxi, you don't ask yourself whether the driver has a particular license to drive on a particular road, or whether you'll have to jump into the front seat to grab the steering wheel. You just assume that they can get you to your destination without any intervention. When you get in a vehicle driven by the Waymo Driver, you can safely make that assumption! It doesn't mean that your human taxi driver can't look for advice in some situations, nor does it mean that the Waymo Driver can't do the same.
Additionally, and as noted above, we think the SAE distinctions are helpful in determining what constitutes the dynamic driving task that an L4 automated driving system like the Waymo Driver must be able to perform, including that the DDT does not involve strategic functions. The examples you reference here are either functions the Waymo Driver can perform (such as driving through a clearly marked construction zone) or are examples of where the Waymo Driver receives information or clarification of some facts to facilitate its performance of the DDT. Human drivers (like the taxi driver!) receive information to inform their driving from radio reports, navigation devices, or even from asking adjacent motorists in stopped traffic what they see ahead, and in a confusing situation might ask a traffic officer how to get around a crash area.
So your perspective is that a system can be accurately described as “fully autonomous” if it sometimes relies on a human for strategic decision making?
Yes. the Waymo Driver is a fully autonomous driver in the Phoenix service area, and I think most roboticists would agree with me there! This is because for the purpose of driving our riders to their destinations, the Waymo Driver makes all the decisions related to the dynamic driving task.
What robotics research (besides your own, of course!) are you most excited about right now?
I'll be honest, our research at Waymo into high-capability decision-making systems that promote safety and interact naturally with humans is about as cool (and challenging) as it gets! It involves reasoning about uncertainty (and our own limitations in sensing and interpretation), reasoning about the intentions of other agents, and how the actions of other agents will change depending on our actions, and using millions of miles of real world driving experience and cutting-edge machine learning to accelerate progress in behavior.
I’m also very impressed by both the mechanical engineering and sophisticated footstep planning shown by Boston Dynamics they are doing some really elegant robotics. And a part of my heart belongs to exploration robotics too, be it under water, under ice, or on other planets (or in the case of Europa, all three). It's the combination of rock-solid mechanisms, robust autonomous capability, and ground-breaking scientific discovery.
The need to have a human somewhere in the loop for strategic edge cases is a very robot-y thing, and perhaps that’s why it’s incorporated into the SAE’s autonomy levels. And technically, Waymo is absolutely correct to call its vehicle fully autonomous based on that definition. I think the risk, though, is that people may not intuitively understand that “full autonomy” only applies to the dynamic driving task, and not the strategic planning task, which (for humans) is an integral part of what we tend to think of as “driving.”
What I’d really like to know is what happens, some number of years from now, after Waymo has solved the strategic planning part of driving (which I’m sure they will). Because at that point, the Waymo Driver will be more autonomous than it was before, and they’ll have to communicate that somehow. Even fuller autonomy? Full autonomy plus? Overfull autonomy? I can’t wait to find out.