While I have been en vacances the past few weeks, there have been several potential IT Hiccups of the Week stories of interest, including the 200-to-500 year old Indian women getting free sewing machines and Philippine’s fast food giant Jollibee Food having to temporarily close 72 of its restaurants in the Manila region because of problems the company experienced migrating to a new IT system—much to the disappointment of its Chickenjoy fans. However, the one hiccup that stands above the rest was the Internet difficulties reportedly experienced last week by the likes of eBay, Amazon, and LinkedIn, among many others.
One of the first inklings that something was amiss was the UK’s Inquirer story Tuesday reporting that eBay was experiencing spotty service and in some cases complete outages in parts of the UK and Europe for several hours. eBay said without elaboration that the performance issue had to do with “third party internet service provider access issues” and that it was “sorry for the inconvenience" being caused.
UK eBay sellers were furious at the reportedly tenth disruption of the year and were demanding compensation, the London Telegraph reported on Wednesday. However, the Telegraph also reported in an accompanying story that Internet difficulties not only affected the auction company, but hit the newspaper’s online operations as well. A story at SiliconANGLE reported that Amazon and LinkedIn were also affected by service disruptions, while ZDNet indicated that U.S. Internet service providers Level 3, AT&T, Cogent, Comcast, Sprint, Time-Warner and Verizon all experienced sporadic performance problems across the United States and parts of Canada. The Register also reported that Canadian ISP Shaw Communications had suffered from fairly severe network disruptions as well.
The reported culprit behind last week’s disruptions appears to be a well-known network technical risk that turned into a not-unexpected annoying problem: the global Internet routing table apparently exceeded 512,000 routes. As a result, many older routers that cannot support more than that number of routes because of memory and other limitations are at risk of sporadically causing some level of local Internet service instability until they are upgraded or replaced to handle the ever increasing number of Internet routes [pdf]. Speculation was that Tuesday’s disruptions were caused in part—or at least exacerbated—by the network activity of Verizon, which pushed the routing table to exceed the 512K threshold for a short time. The 512K mark is expected to be crossed permanently any time now, however.
Router supplier Cisco warned about a need to upgrade routers on its blog back in May when the global routing table passed 500,000 routes. It also laid out what its customer could do to upgrade their Cisco kit or perform workarounds. Last week, with the 512K milestone seemingly reached, Cisco posted another blog saying that it was really time to take action to “avoid any performance degradation, routing instability, or impact to availability.”
In highly simplistic terms, the root issue is that the vast majority of routers in operation use Internet protocol version 4 (IPv4) which was originally specified to support 512K Internet routes, with an unknown number of them unable (even with workarounds) to address more than that number. A newer router protocol IPv6 exists that allows for the addressing of 340 undecillion unique routes. But using that protocol requires investing in new equipment. If you want to know how big 340 undecillion is, and what some of the concerns are in moving from IPv4 and IPv6, there is a great interview from 2011 done by IEEE Spectrum former editor Steven Cherry with IPv6 evangelist Owen DeLong of Hurricane Electric, a company which claims to be “the largest IPv6 backbone in the world as measured by number of networks connected.” I also suggest reading a Spectrum story from 2006 on some of the reasons why migration to IPv6 has been a slow process as well as this piece at the Register from last week.
Whether bypassing the 512K routing milestone will cause many Internet disruptions remains to be seen, although the betting seems that it won’t. For instance, a story at the Guardian quotes James Blessing, chair of the UK Internet Service Providers Association, as saying, “In the grand scheme of things, it’s tiny. It’s a glitch, glitches happen… If someone at an ISP hasn’t noticed it by now, it’s too late as the default table is over 512,000, so nothing that had this problem is now connected to the internet and working… We’ve had the glitch and nothing further will happen now concerning the 512,000 bug.”
Others, however, are not as nearly sanguine, and warn that further service disruptions shouldn’t be discounted until routers are upgraded or replaced, as happened when the 128K and 256K table limits were reached.
It won’t take long to find out whether last week's Internet hiccup was a one-time event or the beginning of a few weeks of Internet service burps.
In Other News…
Contributing Editor Robert N. Charette is an acknowledged international authority on information technology and systems risk management. A self-described “risk ecologist,” he is interested in the intersections of business, political, technological, and societal risks. Along with being editor for IEEE Spectrum’s Risk Factor blog, Charette is an award-winning author of multiple books and numerous articles on the subjects of risk management, project and program management, innovation, and entrepreneurship. A Life Senior Member of the IEEE, Charette was a recipient of the IEEE Computer Society’s Golden Core Award in 2008.