Downloading a Million Lions in a Day
A Techwise Conversation with Akamai Technologies architect Bobby Blumofe
Hi, this is Steven Cherry forIEEE Spectrum
’s “Techwise Conversations.”
Last Wednesday, Apple released OS X Lion. It’s the seventh major upgrade of OS X, but this time around, the process was distinctly different—there are no disks. Instead, you have to download—all three and a half gigabytes of it. That’s a lot of data for the user to pull in, but if you think about it, that’s an enormous amount of data for Apple to push out: Within a day, more than a million people had downloaded the software. Yet, Apple delivered that tsunami of data without too many hitches. Sometimes customers had to wait to start downloading, but once they got in, it went smoothly and sometimes even quickly—in some cases, as little as 10 minutes.
So what’s Apple’s secret? Like most companies these days, it hires a content-delivery network, or CDN, to help manage big download events. Reportedly, the CDN behind the Lion downloads is Massachusetts-based Akamai Technologies. Although Akamai can’t say outright which events it is or isn’t hosting, we do know that the company counts among its clients Apple, Adobe, American Idol, BlockBuster, and ESPN, to name just a few.
My guest today is Bobby Blumofe, Akamai’s senior vice president of networks and operations. Bobby, welcome to the podcast.
Bobby Blumofe: Hi. Thank you.
Steven Cherry: Bobby, I know it’s hard for you to talk about actual events, so let’s do a little hypothetical. Apple makes a point of saying that three and a half gigabytes is about the size of an HD movie, so let’s say a few months from now, just in time for Christmas vacation, Harry Potter and the Deathly Hallows: Part 2 is released and it’s distributor—I think it’s Warner Bros.—let’s say, they decide there will be no Blu-Ray disk, no disks at all; it’ll all be downloads, from them, or maybe they’ll make it an iTunes exclusive or Netflix or Blockbuster. And whoever it is, goes to Akamai and says, we’re expecting a million downloads the first day, and oh, by the way, it’s three and a half gigabytes. What happens at Akamai’s end?
Bobby Blumofe: Well, assuming this is already a customer, they’re probably already configured to use our network, and at our end in terms of people nothing happens; the system handles the whole thing from here. The customer will post their file or files wherever it is they would normally post them for example on their origin Web server, or they might upload the files to our storage in the cloud, and basically at that point they simply need to provide a link to the content for their customers and the system at that point just does what it does. The system has been designed to respond to download events and other kinds of events—media events, for example, news events—without requiring human intervention. The system detects the load, sends the load to the right places and just works.
Steven Cherry: So in the case of a download, I guess you can kind of preload that software on—you have servers in every part of the world, almost every country I guess by now. How does that work for a stream, though? How would that work for a big event? In fact, actually, the BBC webcast the royal wedding on YouTube and according to Google 100 million watched it. How would that work at Akamai?
Bobby Blumofe: Yeah, so now you’re talking about live streaming events and we do indeed host many very large live streaming events from online concerts to news updates and many other sorts of things. And those can indeed generate very large amounts of load, cause you’ve got large numbers of people watching the same thing at the same time, so it’s very similar as you describe with the download of a new software release, but in this case we’re talking about streaming media. And in that case our network, and as you mentioned it’s a very large distributed network today—it’s over 95 000 servers distributed in about 1800 locations, 1000 different networks in 74 countries—so it’s ubiquitous all around the world, and in the case of a live streaming event, that network of servers, in effect you can think of it as acting like an overlay multicast network. So, for example, one copy of that live stream might be sent to a server in a particular country, from there it might fan out to a few other servers in that country, and from there it might fan out to all those end users in that country. So you can think of it as sort of forming a tree, a fan-out tree that spans those servers all over the world and creates a very efficient overlay multicast network.
Steven Cherry: Are there any delays involved in all of that fanning out?
Bobby Blumofe: Well it doesn’t generate any additional delays than the normal delays that would simply be in the network and in fact many of our customers have fairly stringent delay requirements, in that they require that the image that the end user is seeing is delayed by only a fixed amount, no more than some fixed amount from when the event actually occurs. So there is of course, you know, delay simply propagating through the network—after all, light only travels so fast—but we’re really not introducing any additional delays. The actual compute and propagation that happens on the servers is extremely fast. And in fact because we’re doing this step-by-step multicast, what you’re doing is you’re breaking up the path into multiple hops, and for your audience who are familiar with TCP you probably recognize that by doing that you actually are going to make the whole transmission much more efficient with less packet loss and much faster responses to do retransmits when the packet loss does occur. So in fact you actually can speed things up by introducing those additional layers of compute and fan out.
Steven Cherry: So actually the download case is in some ways more complicated, because if I start a download and then somebody in the next cubicle over starts the same download a minute later it’s two completely different downloads right?
Bobby Blumofe: That’s right and that case it’s basically a collection of unicast, but the key is that those unicast streams are all happening very near the end users, so as this content becomes popular, as the event has more and more people downloading, the file and the chunks of the file get replicated to more and more servers closer and closer to the end users, so that as the end users are downloading they’re downloading from an Akamai server that is probably right in the very same city, probably right in the very same ISP, and therefore even though it’s unicast, it’s traveling a very short distance which means less packet loss, means less latency, and again for your audience who are familiar with TCP you know that that translates into higher throughput. And when you’re doing download, throughput is important. You want the downloads to complete quickly and reliably, because an end user that fails to get that download may go elsewhere to purchase that object.
Steven Cherry: Now, you know, its been pretty widely reported that Akamai was the primary content-delivery network for Netflix for a while and then the business went to Level 3 Communications. Without getting into those details, Netflix streams by some measures about 20 percent of all Internet packets. In this case it’s not the burst of activity like a million big downloads in one day but an ongoing steady flood of data. What’s it like managing that?
Bobby Blumofe: Yeah, you know so far we’ve talked about downloads, and we’ve talked about live streaming. Now you’re talking about on-demand streaming. So in this case you’re not trying to simply copy the object from one place to another, but rather stream the bits from the server to the end user. And in this case the key measure here is that you have to maintain throughput at a very high level for a very long, long period of time. An end user who is watching a movie, possibly a two- or three-hour movie, does not want to see rebuffering occur in the middle of the movie, you know. They want a TV quality or Blu-ray quality experience without rebuffering, without delays, without pauses, and to do that at that level of quality, you’re talking about a very high bit-rate generally measured in the megabits per second, several megabits per second, and that, as I said, has to be sustained for a long, long period of time so that you don’t get those glitches that people—that your audience isn’t going to tolerate.
Steven Cherry: Yeah, you mentioned the high quality of the movie image. You know the royal wedding was, I guess, a small YouTube window directly on your computer, but a Netflix movie can be a high-def stream directly to a 50-inch television so I guess it seems like content-delivery growth involves two very different growth trends combined, right? You have more and more customers, more and more people getting more and more stuff, but also wanting it at higher and higher definition or higher bandwidth experiences so, I mean, can content-delivery networks actually keep up with all of that growth?
Bobby Blumofe: Well they can if they’re architected right. A content-delivery network that’s going to scale to that level needs to be highly distributed. You need to have the servers near the end users at what we call the “edge” of the Internet where all the capacity is. And it has to be able to efficiently map the Internet, identify where the hot spots are and route around them, and use that server platform very efficiently. And that’s an important factor as well. All this has to be cost effective and that means all those resources have to be used very cost effectively, and that means copying popular content to the servers in the areas where it’s popular, the less popular content making fewer copies of it. Doing all this automatic is important. You don’t always know what’s popular, but just to reinforce your point—I think you’re on the right point—which is that indeed many of our customers are targeting many different screens from, you know, iPads and other kinds of tablets to cellphones, but also the large screen in your living room, and we have many media customers today running very, very high bit-rates because if it’s going to look good on the living room screen you need a very high bit-rate for that level of quality, and therefore you need that distributed network to sustain that high throughput.
Steven Cherry: So it seems the biggest needs for content-delivery networks right now are—we’ve talked about the live news and sports and other live events—and also this business that we started with the end of CDs and DVD discs with movies and software coming by download. Is there anything else really big on the horizon here?
Bobby Blumofe: Well I think cloud is an important trend. More and more end users are not only interacting with applications that are hosted not locally, that are hosted out in the cloud, but also storing information in the cloud. These days you can keep your media files, you can keep your documents, you can keep all kinds of information that you in the past might have kept on a local disk, which creates, of course, challenges. You’ve got to maintain that disk and it’s therefore not available on your other computers. Now instead of doing that, what you can do is simply store all that information in the cloud. And that means that these objects that you interact with and these applications that you interact with are no longer hosted right on your local machine or potentially even in the same building. They might be hosted hundreds or even thousands of miles away.
Steven Cherry: And just to bring it full circle, I guess a lot of those objects are actually people’s music libraries and video libraries.
Bobby Blumofe: Absolutely. It makes a lot more sense to store those things in the cloud where it’s going to be available to you anywhere you go on any machine.
Steven Cherry: Very good. Well Bobby, when it comes to something like downloading Lion, we mainly think of our own broadband connection to our cable or DSL provider. It’s really kind of a testament to the content-delivery networks that they blend into the background so much that we don’t even think about them being able to keep up. So I guess on behalf of all downloaders and streamers, thanks for that, and thanks for being my guest today.
Bobby Blumofe: My pleasure. Thank you very much.
Steven Cherry: We’ve been speaking with Bobby Blumofe, senior vice president of networks and operations for Akamai Technologies, about how the company handles the deluge of download requests during a major software release. For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
This interview was recorded 27, July 2011.
Segment producer: Ariel Bleicher; audio engineer: Francesco Ferorelli
Follow us on Twitter @spectrumpodcast
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum's audio programming is the audio version.