Steven Cherry Hi, this is Steven Cherry for Radio Spectrum.
In the winter of 2006 I was in Utah reporting on a high-speed broadband network, fiber-optic all the way to the home. Initial speeds were 100 megabits per second, to rise tenfold within a year.
I remember asking one of the engineers, “That’s a billion gigabits per second—who needs that?” He told me that some studies had been done in southern California, showing that for a orchestra to rehearse remotely, it would need at least 500 megabits per second to avoid any latency that would throw off the synchronicity of a concert performance. This was fourteen years before the coronavirus would make remote rehearsals a necessity.
You know who else needs ultra-low latency? Autonomous vehicles. Factory robots. Multiplayer games. And that’s today. What about virtual reality, piloting drones, or robotic surgery?
What’s interesting in hindsight about my Utah experience is what I didn’t ask and should have, which was, “So what you really need is low latency, and you’re using high bandwidth as a proxy for that?” We’re so used to adding bandwidth when what we really need is to reduce latency that we don’t even notice we’re doing it. But what if enterprising engineers got to work on latency itself? That’s what today’s episode is all about.
It turns out to be surprisingly hard. In fact, if we want to engineer our networks for low latency, we have to reengineer them entirely, developing new methods for encoding, transmitting, and routing. So says the author of an article in November’s IEEE Spectrum magazine, “Breaking the Latency Barrier.”
Shivendra Panwar is a Professor in the Electrical and Computer Engineering Department at New York University’s Tandon School of Engineering. He is also the Director of the New York State Center for Advanced Technology in Telecommunications and the Faculty Director of its New York City Media Lab. He is also an IEEE Fellow, quote, “For contributions to design and analysis of communication networks.” He joins us by a communications network, specifically Skype.
Steven Cherry Shiv. Welcome to the podcast.
Shivendra Panwar Thank you. Great to be here.
Steven Cherry Shiv, in the interests of disclosure, let me first rather grandly claim to be your colleague, in that I’m an adjunct professor at NYU Tandon, and let me quickly add that I teach journalism and creative writing, not engineering.
You have a striking graph in your article. It suggests that VoIP, FaceTime, Zoom, they all can tolerate up to 150 milliseconds of latency, while for virtual reality it’s about 10 milliseconds and for autonomous vehicles it’s just two. What makes some applications so much more demanding of low latency than others?
Shivendra Panwar So it turns out, and this was actually news to me, is that we think that the human being can react on the order of 100 or 150 milliseconds.
We hear about fighter pilots in the Air Force who react within 100 ms or the enemy gets ahead of them in a dogfight. But it turns out human beings can actually react at even a lower threshold when they are doing other actions, like trying to touch or feel or balance something. And that can get you down to tens of milliseconds. What has happened is in the 1980s, for example, people were concerned about applications like the ones you mentioned, which required 100, 150 ms, like a phone call or a teleconference. And we gradually figured out how to do that over a packet-switched network like the Internet. But it is only recently that we became aware of these other sets of applications, which require an even lower threshold in terms of delay or latency. And this is not even considering machines. So there are certain mechanical operations which require feedback loops of the order of milliseconds or tens of milliseconds.
Steven Cherry Am I right in thinking that we keep throwing bandwidth at the latency problem? And if so, what’s wrong with that strategy?
Shivendra Panwar So that’s a very interesting question. If you think of bandwidth in terms of a pipe. Okay, so this is going back to George W. Bush. If you don’t remember this famous interview or debate he had and he likened the Internet to be a set of pipes and everyone made fun of him. Actually, he was not far off. You can make the analogy that the Internet is a set of pipes. But coming back to your question, if you view the Internet as a pipe, there are two dimensions to a pipe, it’s the diameter of the pipe, how wide it is, how fat it is, and then there’s the length of the pipe. So if you’re trying to pour ... if you think of bits as a liquid and you’re trying to pour something through that pipe, the rate at which you’d be able to get it out at the other end or how fast you get it out of the other end depends on two things—the width of the pipe and the length of the pipe. So if you have a very wide pipe, you’ll drain the liquid really fast. So that’s the bandwidth question. And if you shorten the length of the pipe, then it’ll come out faster because it has less length of pipe to traverse. So both are important. So bandwidth certainly helps in reducing latency if you’re trying to download a file, for example, because the pipe width will essentially make sure you can download the file faster.
But it also matters how long the pipe is. What are the fixed delays? What are the variable delays going through the Internet? So both are important.
Steven Cherry You say one big creator of latency is congestion delays. To use the specific metaphor of the article, you describe pouring water into a bucket that has a hole in it. If the flow is too strong, water rises in the bucket and that’s congestion delay, water droplets—the packets, in effect—waiting to get out of the hole. And if the water overflows the bucket, if I understand the metaphor, those packets are just plain lost. So how do we keep the water flowing at one millimeter of latency or less?
Shivendra Panwar So that’s a great question. So if you’re pouring water into this bucket with a hole and you want to keep—first of all, you want to keep it from overflowing. So that was: Don’t put too much water because even in the bucket, the hole will gradually fill up and overflow from the top. But the other and equally important issue is you want to fill the bucket, maybe just the bottom of the bucket. You know, just maybe a little bit over that hole so that the time it takes for water that you are pouring to get out is minimized.
And that’s minimizing the queuing delay, minimizing the congestion and minimizing ultimately the delay through the network. And so if you want it to be less than a millisecond, you want to be very careful pouring water into that bucket so that just fields or uses the capacity of that hole but not starts filling up the bucket.
Steven Cherry Phone calls used to run on a dedicated circuit between the caller and the receiver. Everything runs on TCP now, I guess in hindsight, it’s remarkable that we can even have phone calls and Zoom sessions at all with our voices and video chopped up into packets and sent from hop to hop and reassembled at a destination. At a certain point, the retuning that you’re doing of TCP starts to look more and more like a dedicated circuit, doesn’t it? And how do you balance that against the fundamental point of TCP, which is to keep routers and other midpoints available to packets from other transmissions as well as your own?
Shivendra Panwar So that is the key point that you have mentioned here. And this was, in fact, a hugely controversial point back in the ’80s and ’90s when the first experiments to switch voice from circuit-switched networks to packet-switched networks was first considered. And there were many diehards who said you cannot equal the latency and reliability of a circuit switch network. And to some extent, actually, that’s still right. The quality on a circuit switch line, by the time of the 1970s and 1980s, when it had reached its peak of development was excellent.
And sometimes we struggle to get to that quality today. However, the cost issue overrode it. And the fact that you are able to share the network infrastructure with millions and now billions of other people made the change inevitable. Now, having said that, this seems to be a tradeoff between quality and cost. And to some extent it is. But there is, of course, a ceaseless effort to try and improve the quality without giving up anything on the cost. And that’s where the engineering comes in. And that’s where monitoring what’s happening to your connection on a continuous basis so that whenever you sense that congestion is building up, what TCP does in particular is to back off or reduce the rate so that it does not contribute to the congestion. And its vital bits get through in time.
Steven Cherry You make a comparison to shoppers at a grocery store picking which checkout lane has the shorter line or is moving faster. Maybe as a network engineer, you always get it right, but I often pick the wrong lane.
Shivendra Panwar That is that is true in terms of networking as well, because there are certain things that you cannot predict. You might go to two lines in a router, for example, and one may look a lot shorter than yours. But there is some hold up, right? A packet may need extra processing or some other issue may crop up. And so you may end up spending more time waiting on a line which initially appeared short to you. So there is actually a lot of randomness in networking. In fact, a knowledge of probability theory, queuing theory, and all of this probabilistic math, is the basis of engineering networks.
Steven Cherry Let’s talk about cellular for a minute. The move to 5G will apparently help us reduce latency by reducing frame durations, but apparently it also potentially opens us up to more latency because of its use of millimeter waves?
Shivendra Panwar That is indeed a tradeoff. The engineers who work at the physical layer have been working very hard to increase the bandwidth to get us into gigabits per second at this point in 5G and reduce the frame lengths—the time you spend waiting to put your bits onto the channel is reduced. But in this quest for low bandwidth, they moved up the electromagnetic spectrum to millimeter waves, which have a lot more capacity but have poorer propagation characteristics. In the millimeter waves, what happens is it can no longer go through the wall of a building, for example, or even the human body or a tree. If you can imagine yourself, let’s say you’re in Times Square before code and you’re walking with your 5G phone, every passerby, or every truck rolling by, would potentially block the connection between your cell phone and the cell tower. Those interruptions are driven by the physical world. In fact, I joke this is the case of Sir Isaac Newton meeting Maxwell, the electromagnetic guru. Because what happens is those interruptions, since they are uncontrollable essentially, you can get typical interruptions of the order of half a second or a second before you switch to another base station, which is the current technology, and find another way to get your bits through. So those blockages, unfortunately, can easily add a couple of hundred milliseconds of delay because you may not have an alternate way to get your bits through to the cell tower.
Steven Cherry I guess that’s especially important, not so much for phone conversations and Web use or whatever we’re using our phones for, where, as we said before, a certain amount of latency is not a big problem. But 5G is going to be used for the Internet of Things. And there, there will be applications that require very low latency.
Shivendra Panwar Okay, so there are some relatively straightforward solutions. If your application needs relatively low bandwidth, so, many of the IoT applications need kilobits per second, which is a very low rate. What you could do is you could assign those applications to what is called sub-six gigahertz. That is the frequency that we currently use. Those are more reliable in the sense that they penetrate buildings, they penetrate the human body.
And as long as your station has decent coverage, you can have more predictable performance. It is only as we move up the frequency spectrum and we try and send both broadband applications—applications that use a gigabit per second or more—and we want the reliability and the low latency that we start running into problems.
Steven Cherry I noticed that, as you alluded to earlier, there are all sorts of applications where we would benefit from very low latency or maybe can’t even tolerate anything but very low latency. So to take just one example, another of our colleagues, a young robotics professor at NYU, Tandon is working on exoskeletons and rehabilitation robots for Parkinson’s patients to help them control hand tremors. And he and his fellow researchers say, and I’m quoting here, “a lag of nearly 10 or 20 milliseconds can afford effective compensation by the machine and in some cases may even jeopardize safety.”
So are there latency issues even within the bus of an exoskeleton or a prosthetic device that they need to get down to single-digit millisecond latency?
Shivendra Panwar That sounds about right in terms of the 10 to 20 milliseconds or perhaps even less. There is one solution to that, of course, is to make sure that all of the computational power—all of the data that you need to transmit—stays on that human subject (the person who’s using the exoskeleton) and then you do not depend on the networking infrastructure. So that will work. The problem with that is the compute power and communications will, first of all, be heavy, even if we can keep reducing that thanks to Moore’s Law, and also drains a lot of battery power. One approach is seeing if we can get the latency and reliability right, is to offload all of that computation to, let’s say, the nearest base station or a wireless Wi-Fi access point. This will reduce the amount of weight that you’re carrying around in your exoskeleton and reduce the amount of battery power that you need to be able to do this for long periods of time.
Steven Cherry Yeah, something I hadn’t appreciated until your article was, you say that ordinary robots as well could be lighter and have greater uptime and might even be cheaper with ultra low latency.
Shivendra Panwar That’s right. Especially if you think of flying robots. Right? You have UAVs. And there, weight is paramount to keep them up in the air.
Steven Cherry As I understand it, Shiv, there’s a final obstacle, or limit at least, to reducing latency. And that’s the speed of light.
Shivendra Panwar That’s correct. So most of us are aware of the magic number, which is like 300 000 km/s. But that’s in a vacuum or through free space. If you use a fiber-optic cable, which is very common these days, that goes down to 200 000 km/s. So you used to always take that into account but it was not a big issue.
But now, if you think about it, if you are trying to aim for a millisecond delay, let’s say, that is a distance of quote unquote, only 300 km that light controls in free space or even less on a fiber optic cable—down to 200 kilometers. That means you cannot do some of the things we’ve been talking about sitting in New York, if it happens to be something that we are controlling in Seoul, South Korea, right? The speed of light takes a perceptible amount of time to get there. Similarly, what has been happening is the service providers who want to host all these new applications now have to be physically close to you to meet those delay requirements. Earlier, we didn’t consider them very seriously because there are so many other sources of delays and delays were of the order of a hundred milliseconds—up to a second, even, if you think further back—that a few extra milliseconds didn’t matter. And so you could have a server farm in Utah dealing with the entire continental U.S., that would be sufficient. But that is no longer possible.
And so a new field has come up—edge computing, which takes applications closer to the edge in order to support more of these applications. The other reason to consider mobile computing is you can keep the traffic off the Internet core, if you push it closer to the edge. For both those reasons, computation may be coming closer and closer to you in order to keep the latency down and to reduce costs.
Steven Cherry Well, Shiv, it seems we have a never-ending ebb and flow between putting more of the computing at the endpoint, and then more of the computing at the center, from the mainframes and dumb terminals of the 1950s, to the networked workstations ’80s, to cloud computing today, to putting AI inside of IoT nodes tomorrow, but all through it, we always need the network itself to be faster and more reliable. Thanks for the thankless task of worrying about the network in the middle of it all, and for being my guest today.
Shivendra Panwar Thank you. It’s been a pleasure talking to Steve.
Steven Cherry We’ve been speaking with IEEE Fellow Shivendra Panwar about his research into ultra-low-latency networking at NYU’s Tandon School of Engineering.
Radio Spectrum is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.
For Radio Spectrum, I’m Steven Cherry.