How and why we spend trillions to keep old software going
“Fix the damn unemployment system!"
This past spring, tens of millions of Americans lost their jobs due to lockdowns aimed at slowing the spread of the SARS-CoV-2 virus. And untold numbers of the newly jobless waited weeks for their unemployment benefit claims to be processed, while others anxiously watched their bank accounts for an extra US $600 weekly payment from the federal government.
Delays in processing unemployment claims in 19 states—Alaska, Arizona, Colorado, Connecticut, Hawaii, Iowa, Kansas, Kentucky, New Jersey, New York, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, Texas, Vermont, Virginia, and Wisconsin—are attributed to problems with antiquated and incompatible state and federal unemployment IT systems. Most of those systems date from the 1980s, and some go back even further.
Things were so bad in New Jersey that Governor Phil Murphy pleaded in a press conference for volunteer COBOL programmers to step up to fix the state's Disability Automated Benefits System. A clearly exasperated Murphy said that when the pandemic passed, there would be a post mortem focused on the question of “how the heck did we get here when we literally needed cobalt [sic] programmers?"
Similar problems have emerged at the federal level. As part of the federal government's pandemic relief plan, eligible U.S. taxpayers were to receive $1,200 payments from the Internal Revenue Service. However, it took up to 20 weeks to send out all the payments because the IRS computer systems are even older than the states' unemployment systems, some dating back almost 60 years.
As the legendary investor Warren Buffett once said, “It's only when the tide goes out that you learn who's been swimming naked." The pandemic has acted as a powerful outgoing tide that has exposed government's dependence on aging legacy IT systems.
But governments aren't the only ones struggling under the weight of antiquated IT. It is equally easy to find airlines, banks, insurance companies, and other commercial entities that continue to rely on old IT, contending with software or hardware that is no longer supported by the supplier or has defects that are too costly to repair. These systems are prone to outages and errors, vulnerable to cyberintrusions, and progressively more expensive and difficult to maintain.
Since 2010, corporations and governments worldwide have spent an estimated $35 trillion on IT products and services. Of this amount, about three-quarters went toward operating and maintaining existing IT systems. And at least $2.5 trillion was spent on trying to replace legacy IT systems, of which some $720 billion was wasted on failed replacement efforts.
But it's astonishing how seldom people notice these IT systems, even with companies and public institutions spending hundreds of billions of dollars every year on them. From the time we get up until we go to bed, we interact, often unknowingly, with dozens of IT systems. Our voice-activated digital assistants read the headlines to us before we hop into our cars loaded with embedded processors, some of which help us drive, others of which entertain us as we guzzle coffee brewed by our own robotic baristas. Infrastructure like wastewater treatment plants, power grids, air traffic control, telecommunications services, and government administration depends on hundreds of thousands of unseen IT systems that form another, hidden infrastructure. Commercial organizations rely on IT systems to manage payroll, order supplies, and approve cashless sales, to name but three of thousands of automated tasks necessary to the smooth functioning of a modern economy. Though these systems run practically every aspect of our lives, we don't give them a second thought because, for the most part, they function. It doesn't even occur to us that IT is something that needs constant attention to be kept in working order.
In his landmark study The Shock of the Old: Technology and Global History Since 1900 (Oxford University Press, 2007), British historian David Edgerton claims that although maintenance and repair are central to our relationship with technology, they are “matters we would rather not think about." As a result, technology maintenance “has lived in a twilight world, hardly visible in the formal accounts societies make of themselves."
Indeed, the very invisibility of legacy IT is a kind of testament to how successful these systems are. Except, of course, when they're not.
There's no formal definition of “legacy system," but it's commonly understood to mean a critical system that is out of date in some way. It may be unable to support future business operations; the vendors that supplied the application, operating system, or hardware may no longer be in business or support their products; the system architecture may be fragile or complex and therefore unsuitable for upgrades or fixes; or the finer details of how the system works are no longer understood.
To modernize a computing system or not is a question that bedevils nearly every organization. Given the many problems caused by legacy IT systems, you'd think that modernization would be a no-brainer. But that decision isn't nearly as straightforward as it appears. Some legacy IT systems end up that way because they work just fine over a long period. Others stagger along because the organization either doesn't want to or can't afford to take on the cost and risk associated with modernization.
Worldwide IT Spending since 2010
This data is derived from commercial sources and is a rough estimate. Other sources indicate that US $35 trillion is a significant underestimate because labor costs are not totally accounted for.
Obviously, a legacy system that's critical to day-to-day operations cannot be replaced or enhanced without major disruption. And so even though that system contributes mightily to the organization's operations, management tends to ignore it and defer modernization. On most days, nothing goes catastrophically wrong, and so the legacy system remains in place.
This “kick the can" approach is understandable. Most IT systems, whether new or modernized, are expensive affairs that go live late and over budget, assuming they don't fail partially or completely. These situations are not career-enhancing experiences, as many former chief information officers and program managers can attest. Therefore, once an IT system is finally operating reliably, there's little motivation to plan for its eventual retirement.
What management does demand, however, is for any new IT system to provide a return on investment and to cost as little as possible for as long as possible. Such demands often lead to years of underinvestment in routine maintenance. Of course, those same executives who approved the investment in the new system probably won't be with the organization a decade later, when that system has legacy status.
Similarly, the developers of the system, who understand in detail how it operates and what its limitations are, may well have moved on to other projects or organizations. For especially long-lived IT systems, most of the developers have likely retired. Over time, the system becomes part of the routine of its users' daily life, like the office elevator. So long as it works, no one pays much attention to it, and eventually it recedes into the organization's operational shadows.
Thus does an IT system quietly age into legacy status.
Millions of people every month experience the frustrations and inconveniences of decrepit legacy IT.
U.K. bank customers know this frustration only too well. According to the U.K. Financial Conduct Authority, the nation's banks reported nearly 600 IT operational and security incidents between October 2017 and September 2018, an increase of 187 percent from a year earlier. Government regulators point to the banks' reliance on decades-old IT systems as a recurring cause for the incidents.
Airline passengers are equally exasperated. Over the past several years, U.S. air carriers have experienced on average nearly one IT-related outage per month, many of them attributable to legacy IT. Some have lasted days and caused the delay or cancellation of thousands of flights.
Poorly maintained legacy IT systems are also prone to cybersecurity breaches. At the credit reporting agency Equifax, the complexity of its legacy systems contributed to a failure to patch a critical vulnerability in the company's Automated Consumer Interview System, a custom-built portal developed in the 1970s to handle consumer disputes. This failure led, in 2017, to the loss of 146 million individuals' sensitive personal information.
Aging IT systems also open the door to crippling ransomware attacks. In this type of attack, a cyberintruder hacks into an IT system and encrypts all of the system data until a ransom is paid. In the past two years, ransomware attacks have been launched against the cities of Atlanta and Baltimore as well as the Florida municipalities of Riviera Beach and Lake City. The latter two agreed to pay their attackers $600,000 and $500,000, respectively. Dozens of state and local governments, as well as school systems and hospitals, have experienced ransomware attacks.
Even if they don't suffer an embarrassing and costly failure, organizations still have to contend with the steadily climbing operational and maintenance costs of legacy IT. For instance, a recent U.S. Government Accountability Office report found that of the $90 billion the U.S. government spent on IT in fiscal year 2019, nearly 80 percent went toward operation and maintenance of existing systems. Furthermore, of the 7,000 federal IT investments the GAO examined in detail, it found that 5,233 allocated all their funding to operation and maintenance, leaving no monies to modernize. From fiscal year 2010 to 2017, the amount spent on IT modernization dropped by $7.3 billion, while operation and maintenance spending rose by 9 percent. Tony Salvaggio, founder and CEO of CAI, an international firm that specializes in supporting IT systems for government and commercial firms, notes that ever-growing IT legacy costs will continue to eat government's IT modernization “seed corn."
While not all operational and maintenance costs can be attributed to legacy IT, the GAO noted that the rise in spending is likely due to supporting obsolete computing hardware—for example, two-thirds of the Internal Revenue Service's hardware is beyond its useful life—as well as “maintaining applications and systems that use older programming languages, since programmers knowledgeable in these older languages are becoming increasingly rare and thus more expensive."
U.S. Federal government spending on IT products and services, 2019
The U.S. Office of Management and Budget breaks IT investment costs into two categories: (1) operation and maintenance and (2) development, modernization, and enhancement.
Take COBOL, a programming language that dates to 1959. Computer science departments stopped teaching COBOL some decades ago. And yet the U.S. Social Security Administration reportedly still runs some 60 million lines of COBOL. The IRS has nearly as much COBOL programming, along with 20 million lines of assembly code. And, according to a 2016 GAO report, the departments of Commerce, Defense, Treasury, Health and Human Services, and Veterans Affairs are still “using 1980s and 1990s Microsoft operating systems that stopped being supported by the vendor more than a decade ago."
Given the vast amount of outdated software that's still in use, the cost of maintaining it will likely keep climbing not only for government, but for commercial organizations, too.
The first step in fixing a massive problem is to admit you have one. At least some governments and companies are finally starting to do just that. In December 2017, for example, President Trump signed the Modernizing Government Technology Act into law. It allows federal agencies and departments to apply for funds from a $150 million Technology Modernization Fund to accelerate the modernization of their IT systems. The Congressional Budget Office originally indicated the need was closer to $1.8 billion per year, but politicians' concerns over whether the money would be well spent resulted in a significant reduction in funding.
Part of the modernization push by governments in the United States and abroad has been to provide more effective administrative controls, increase the reliability and speed of delivering benefits, and improve customer service. In the commercial sector, by contrast, IT modernization is being driven more by competitive pressures and the availability of newer computing technologies like cloud computing and machine learning.
“Everyone understands now that IT drives organization innovation," Salvaggio told IEEE Spectrum. He believes that the capabilities these new technologies will create over the next few years are “going to blow up 30 to 40 percent of [existing] business models." Companies saddled with legacy IT systems won't be able to compete on the expected rapid delivery of improved features or customer service, and therefore “are going to find themselves forced into a box canyon, unable to get out," Salvaggio says.
This is already happening in the banking industry. Existing firms are having a difficult time competing with new businesses that are spending most of their IT budgets on creating new offerings instead of supporting legacy systems. For example, Starling Bank in the United Kingdom, which began operations in 2014, offers only mobile banking. It uses Amazon Web Services to host its services and spent a mere £18 million ($24 million) to create its infrastructure. In comparison, the U.K.'s TSB bank, a traditional full-service bank founded in 1810, spent £417 million ($546 million) moving to a new banking platform in 2018.
Starling maintains all its own code and does an average of one software release per day. It can do this because it doesn't have the intricate connections to myriad legacy IT systems, where every new software release carries a measurable risk of operational failure, according to the U.K.'s bank regulators. Simpler systems mean fewer and shorter IT-related outages. Starling has had only one major outage since it opened, whereas each of the three largest U.K. banks has had at least a dozen apiece over the same period.
The Multilayers of the U.S. Navy's Legacy Pay and Personnel System
This hodgepodge of IT systems goes back to the early 1980s, when the Navy followed a failed 1960s-era IT-modernization effort with another not entirely successful project to consolidate its multiple pay and personnel systems. Yet a third modernization effort began in the late 1990s. It too was only partially successful, leading to the latest modernization effort.
Modernization creates its own problems. Take the migration of legacy data to a new system. When TSB moved to its new IT platform in 2018, some 1.9 million online and mobile customers discovered they were locked out of their accounts for nearly two weeks. And modernizing one legacy system often means having to upgrade other interconnecting systems, which may also be legacy. At the IRS, for instance, the original master tax file systems installed in the 1960s have become buried under layers of more modern, interconnected systems, each of which made it harder to replace the preceding system. The agency has been trying to modernize its interconnected legacy tax systems since 1968 at a cumulative cost of at least $20 billion in today's money, so far with very little success. It plans to spend up to another $2.7 billion on modernization over the next five years.
Another common issue is that legacy systems have duplicate functions. The U.S. Navy is in the process of installing its $167 million Navy Pay and Personnel system, which aims to consolidate 223 applications residing in 55 separate IT systems, including 10 that are more than 30 years old and a few that are more than 50 years old. The disparate systems used 21 programming languages, executing on nine operating systems ranging across 73 data centers and networks.
Such massive duplication and data silos sound ridiculous, but they are shockingly common. Here's one way it often happens: The government issues a new mandate that includes a requirement for some type of automation, and the policy comes with fresh funding to implement it. Rather than upgrade an existing system, which would be disruptive, the department or agency finds it easier to just create a new IT system, even if some or most of the new system duplicates what the existing system is doing. The result is that different units within the same organization end up deploying IT systems with overlapping functions.
“The shortage of thinking about systems engineering" along with the lack of coordinating IT developments to avoid duplication have long plagued government and corporations alike, Salvaggio says.
Top 10 U.S. Federal Systems Most In Need of Modernization
The U.S. Government Accountability Office analyzed 65 federal legacy systems in need of modernization that 24 agencies had identified. Of these, the GAO identified 10 that most require modernization, based on attributes such as age, criticality, and risk.
The best way to deal with legacy IT is to never let IT become legacy. Growing recognition of legacy IT systems' many costs has sparked a rethinking of the role of software maintenance. One new approach was recently articulated in Software Is Never Done, a May 2019 report from the U.S. Defense Innovation Board. It argues that software should be viewed “as an enduring capability that must be supported and continuously improved throughout its life cycle." This includes being able to test, integrate, and deliver improvements to software systems within short periods of time and on an ongoing basis.
Here's what that means in practice. Currently, software development, operations, and support are considered separate activities. But if you fuse those activities into a single integrated activity—employing what is called DevOps—the operational system is then always “under development," continuously and incrementally being improved, tested, and deployed, sometimes many times a day.
DevOps is just one way to keep core IT systems from turning into legacy systems. The U.S. Defense Advanced Research Projects Agency has been exploring another, potentially more effective way, recognizing the longevity of IT systems once implemented.
Since 2015, DARPA has funded research aimed at making software that will be viable for more than 100 years. The Building Resource Adaptive Software Systems (BRASS) program is trying to figure out how to build “long-lived software systems that can dynamically adapt to changes in the resources they depend upon and environments in which they operate," according to program manager Sandeep Neema.
Creating such timeless systems will require a “start from scratch" approach to software design that doesn't make assumptions about how an IT system should be designed, coded, or maintained. That will entail identifying the logical (libraries, data formats, structures) and physical resources (processing, storage, energy) a software program needs for execution. Such analyses could use advanced AI techniques that discover and make visible an application's operations and interactions with other applications and systems. By doing so, changes to resources or interactions with other systems, which account for many system failures or inefficient operations, can be actively managed before problems occur. Developers will also need to create a capability, again possibly using AI, to monitor and repair all elements of the execution environment in which the application resides.
The goal is to be able to update or upgrade applications without the need for extensive intervention by a human programmer, Neema told Spectrum, thereby “buying down the cost of maintenance."
The BRASS program has funded nine projects, each of which represents different aspects of what a resource-adaptive software system will need to do. Some of the projects involve UAVs, mobile robots, and high-performance computing. The final results of the effort are expected later this year, when the technologies will be released to open-source repositories, industry, and the Defense Department.
Some Major Legacy System Debacles of The Last Decade
Here is a small sampling of the most notable modernization failures, outages, and cybersecurity breaches involving legacy systems beginning in 2010, as well as instances where legacy systems have been replaced but whose subsequent operations have been problematic.
|Legacy System Modernization Failures|
|Legacy System Replaced, But Subsequent Operational Difficulties|
|Outage blamed on legacy IT system|
|Legacy System Contributes to Cybersecurity Breach|
|2010||Department of Defense||U.S. The DoD's Defense Integrated Military Human Resources System to consolidate the 90 different IT systems being used for payroll and personnel into one is canceled after 12 years and $1 billion spent.|
|2010||National Australia Bank||An erroneously loaded file corrupts the National Australia Bank's routine nightly transaction run, leading to an outage that leaves 11 million customers without access to their accounts, as well as affecting other Australian banks. It takes nearly two weeks to sort out the problems.|
|2011–2013||California Public Employees' Retirement System (CalPERS)||CalPERS launches its $514 million modernized integrated employee pension system, which integrates 49 legacy systems into a single system, two years late and $235 million over budget. However, defects and missing critical functionality require another $72 million to fix.|
|2012||California Judiciary||California's Court Case Management System (CCMS) modernization effort to replace 70 IT systems with one is canceled after 11 years and $500 million spent.|
|2012||U.S. Air Force||The U.S. Air Force's Expeditionary Combat Support System (ECSS) logistics modernization effort to replace at least 240 IT systems with one system is canceled after seven years and $1.03 billion spent.|
|2012||U.K. Royal Bank of Scotland Group||An error in updating the batch software that controls the RBS Bank Group's payments-processing system keeps at least 6.5 million customers from accessing their accounts for over a week; the total cost of recovery is estimated to be at least £230 million.|
|2013||Pennsylvania Department of Labor and Industry||Pennsylvania's unemployment system modernization effort is canceled after seven years and $170 million spent.|
|2014–2016||Canada's Ontario Ministry of Community and Social Services||Ontario's Community and Social Services launches its CA $242 million Social Assistance Management System (SAMS) 18 months late and CA $40 million over budget. Technical problems are caused by major user problems and numerous errors in benefits payments. Cost to fix the defects are CA $52 million.|
|2015||Deutsche Post DHL||Shipper Deutsche Post DHL scraps its New Forwarding Environment (NFE) transformation project and reverts to its 30-year-old legacy Logis IT system after four years and a cost of €345 million.|
|2015||U.S. Office of Personnel Management||OPM blames its legacy IT systems for allowing hackers to access sensitive personal information of 25.7 million individuals in two separate cyberattacks; the total recovery cost is estimated to be up to $1 billion.|
|2016||Southwest Airlines||The failure of a network router and backup system for 12 hours leads to IT system outages and causes 2,300 flights to be canceled over five days; costs relating to the outage are estimated at $82 million.|
|2016–2018||Rhode Island Department of Human Services||Rhode Island's DHS rolls out its new $364 million Unified Health Infrastructure Project (UHIP) public-assistance program over a year late and $250 million over budget. Errors with the system keeps beneficiaries from receiving benefits, which causes the federal courts to appoint a special master. It takes three years and over $250 million to stabilize the system, although errors are still cropping up.|
|2016–present||Public Services and Procurement Canada||Canada's Public Services and Procurement launches its CA $309 million Phoenix payroll system four months late but on budget. Major defects cause over 250,000 government employees to be either paid too much, too little, or not at all. Pay problems are expected to continue until 2023, with the cost to stabilize the system estimated to be at least CA $1.8 billion.|
|2017||U.K. National Health Service||Underfunded IT legacy systems leave 40 National Health Service hospitals and 24 trusts vulnerable to a cybersecurity attack, which locks medical records for up to 26 million patients and leads to the cancellation of 19,000 appointments; the total recovery cost is estimated to be at least £92 million.|
|2018||Lidl||International supermarket chain Lidl decides to revert to its homegrown legacy merchandise-management system after three years of trying to make its €500 million modernized system work properly.|
|2018||U.S. Coast Guard||The Coast Guard's electronic health record modernization effort is canceled after almost six years and $67 million spent.|
|2018||City of Atlanta, Georgia||Atlanta's underfunded legacy IT systems leaves it vulnerable to a ransomware attack that infects the city's computer networks and encrypts over one-third of its computer applications; the total recovery cost is estimated to be at least $17 million.|
|2018–2019||U.K. TSB Bank||TSB's modernization of its banking IT systems causes 1.9 million customers to be locked out of their accounts, some for weeks; cost of the outage is estimated to be at least £330 million.|
|Legacy System Modernization Failures|
|Legacy System Replaced, But Subsequent Operational Difficulties|
|Outage blamed on legacy IT system|
|Legacy System Contributes to Cybersecurity Breach|
Neema says no one should expect BRASS to deliver “a general-purpose software repair capability." A more realistic outcome is an approach that can work within specific data, software, and system parameters to help the maintainers who oversee those systems to become more efficient and effective. He of course hopes that private companies and other government organizations will build on the BRASS program's results.
The COVID-19 pandemic has exposed the debilitating consequences of relying on antiquated IT systems for essential services. Unfortunately, that dependence, along with legacy IT's enormous and increasing costs, will still be with us long after the pandemic has ended. For the U.S. government alone, even a concerted and well-executed effort would take decades to replace the thousands of existing legacy systems. Over that time, current IT systems will also become legacy and themselves require replacement. Given the budgetary impacts of the pandemic, even less money for legacy system modernization may be available in the future across all government sectors.
The problems associated with legacy systems will only worsen as the Internet of Things, with its billions of interconnected computing devices, matures. These devices are already being connected to legacy IT, which will make it even more difficult to replace and modernize those systems. And eventually the IoT devices will become legacy. Just as with legacy systems today, those devices likely won't be replaced as long as they continue to work, even if they are no longer supported. The potential cybersecurity risk of vast numbers of obsolete but still operating IoT devices is a huge unknown. Already, many IoT devices have been deployed without basic cybersecurity built into them, and this shortsightedness is taking a toll. Cybersecurity concerns compelled the U.S. Food and Drug Administration to recall implantable pacemakers and insulin pumps and the National Security Agency to warn about IoT-enabled smart furniture, among other things of the Internet.
Now imagine a not-too-distant future where hundreds of millions or even billions of legacy IoT devices are deeply embedded into government and commercial offices, schools, hospitals, factories, homes, and even people. Further imagine that their cybersecurity or technical flaws are not being fixed and remain connected to legacy IT systems that themselves are barely supported. In such a world, the pervasive dependence upon increasing numbers of interconnected, obsolete systems will have created something far grimmer and murkier than Edgerton's twilight world.
This article appears in the September 2020 print issue as “The Hidden World of Legacy IT."
About the Author
Photo: Maura Charette
As a risk consultant for businesses and a slew of three-lettered U.S. government agencies, Contributing Editor Robert N. Charette has seen more than his share of languishing legacy IT systems. As a civilian, he's also been a casualty of a legacy system gone berserk. A few years ago, his bank's IT system, which he later found out was being upgraded, made an error that was most definitely not in his favor.
He'd gone to an ATM to withdraw some weekend cash. The machine told him that his account was overdrawn. Puzzled, because he knew he had sufficient funds in his account to cover the withdrawal, he had to wait until Monday to contact the bank for an explanation. When he called, the customer service representative insisted that he was indeed overdrawn. This was an understatement, considering that the size of the alleged overdraft might have caused a person less versed in software debacles to have a stroke.
“ 'You know, you're overdrawn by [US] $1,229,200,' " Charette recalls being told. “I was like, well, that's interesting, because I don't have that much money in my bank account."
The customer service rep then acknowledged it could be an error caused by a computer glitch during a recent systems upgrade. Two days later he received the letter pictured above from his bank, apparently triggered by a check he had written for $55.80. Charette notes that it wasn't the million-dollar-plus overdraft that triggered the letter, just that last double-nickel drop in the bucket.
The bank never did send a letter apologizing for the inconvenience or explaining the problem, which he believes likely affected other customers. And like so many of the failed legacy-system upgrades—some costing billions, which Charette describes here—it never made the news, either.