The pandemic year just passed once again demonstrates that IT-related failures are universally unprejudiced. Companies large and small, sectors private and public, reputations stellar and scorned: none are exempt. Herewith, the failures, interruptions, crimes and other IT-related setbacks that made the news in 2020.
Over the past several years, airline flight delays and cancellations [PDF] related to IT issues have averaged about one per month. The year 2020 kicked off with “technical issues” affecting British Airways’ computerized check-in at London’s Heathrow Airport, which caused more than 100 flight cancellations with numerous others being delayed. The outage impacted at least 10,000 passengers’ travel plans over two days in February. Then in March, as Covid-19 related government travel bans started to take hold, Delta Air Lines reported, “intermittent technical difficulties” for bookings and ticket changes.
Once the travel bans firmly took hold and flying trimmed back to a minimum, however, there has not been a major IT outage reported since Delta’s. I suspect this hiatus will not last long, as airline flight schedules start returning closer to some semblance of “normal,” perhaps (here’s hoping!) later this year.
Probably the biggest airline IT-related news of the year is the U.S. Federal Aviation Administration’s announcement that the Boeing 737 Max 8 aircraft can resume passenger service once a number of changes [PDF] are made. This may take up to a year to complete for all 450 aircraft that were grounded. The FAA Airworthiness Directive requires “installing new flight control computer (FCC) software, revising the existing [Airplane Flight Manual] to incorporate new and revised flight crew procedures, installing new MAX display system (MDS) software, changing the horizontal stabilizer trim wire routing installations, completing an angle of attack (AOA) sensor system test, and performing an operational readiness flight.” While both Brazil’s and European Union’s country’s civil aviation administration organizations have given their approval for the 737 Max to return to flight, some others like Canada’s, may mandate that additional requirements be met.
Boeing’s myriad problems with the Max 8’s software (itself attributed to the crashes of both Lion Air JT610 and Ethiopian Airlines Flight 302) can be reviewed in both the June FAA Inspector General’s report as well as the final report of the U.S. House Committee on Transportation and Infrastructure investigation. (See also software executive and airplane enthusiast Gregory Travis’s comprehensive 2019 analysis of the 737 Max fiasco for Spectrum.) Whether Boeing or the airlines can convince the public to board the Max remains to be seen, even with American Airlines beginning flights with the aircraft in late December.
Software and electronic-related recalls show no signs of slowing from their 2019 record levels. The year started off with GM issuing a second software recall to remedy problems caused by its first software recall issued in December 2019. The original recall and its software fix were aimed at correcting an error that could disable 463,995 2019 Chevrolet Silverado, GMC Sierra and Cadillac CT6 vehicles’ electronic stability control or antilock brake systems without warnings appearing on the dashboard. Unfortunately, the update was flawed. If an owner remotely started their vehicle using GM’s OnStar app, the brakes were disabled—although warnings were shown on the dash. About 162,000 vehicles received the original fix. The new software update seems to have done the trick.
Both Hyundai and Kia Motors, which Hyundai owns a 34% stake in, issued a number of recalls in 2020 involving moisture problems involving electronic circuits that could cause vehicle fires. Owners of many of the vehicles involved were warned to park their vehicles outside until the repairs were made. Hyundai also had to issue a recall to update its Remote Smart Parking Assistant software for the 2020 Sonata and Nexo models. A software error could allow a vehicle to continue moving after a system malfunction.
Other auto manufacturers had their share of recalls as well. Fiat Chrysler Automobiles recalled 318,537 2019 and 2020 cars and trucks because a software error could allow the backup camera to stay on when a vehicle is moving forward. Toyota recalled 700,000 Prius and Prius V models for a software problem that would prevent the cars from entering a failsafe driving mode as intended, while 735,000 Honda Motors 2018-2020 Accord and 2019-2020 Insight vehicles were recalled for software updates to its Body Control Module to prevent the malfunction of one or more electronic components including the rear-view camera display, turn signals and windshield wipers. Volkswagen had to slip its rollout of its new all-electric ID.3 models by several months due to software issues.
Given the increasing amount and importance of vehicle software, Toyota launched two new software companies in July under an umbrella company called Woven Planet Holdings to increase the capability and reliability of its vehicles’ automation. Volkswagen created its own software business unit in 2019. Meanwhile, GM announced in November that it would hire another 3,000 workers before the end of the first quarter of 2021 to increase its engineering and software development capabilities.
While cloud computing is generally reliable, when it is not, the impacts can be widespread and consequential, especially when so many people working or schooling from home. This truism was highlighted by several cloud computing outages this year. In March, Microsoft Azure experienced a six-hour outage attributed to a cooling system failure and another caused by VM capacity constraints. The same month, Google Cloud went down for about 90 minutes, which was ascribed to issues with infrastructure components. In April, GitHub (owned by Microsoft) experienced several disruptions related to multiple different system misconfiguration issues. In June, the IBM Cloud went down for over three hours due to problems linked to an external network provider—and once more later in the month, this time with little explanation. Amazon’s East Region U.S. AWS center suffered disruptions for over six hours in November for a large number of clients, from Adobe to Roku to The Wall Street Journal, that was caused by an operating system configuration issue. Multiple Google Cloud services suffered back to back service disruptions in December, the first that lasted for about an hour and affected Gmail, Google Classroom, Nest, and YouTube, among others. The outage was blamed on storage issues with Google’s authentication system. The second unscheduled downtime affected Gmail for nearly seven hours. An email configuration update issue was the culprit this time.
Recurrent communication problems continued throughout 2020. Several emergency service systems went offline, including Arizona’s 911 system in June that left 1 million people without service. Hampshire, England’s £39m new 999-system collapsed in July. Meanwhile, September saw 911 outages across 14 states for about an hour.
T-Mobile wireless services, the second largest in the U.S., were unavailable to many of its customers for nearly 12 hours after the introduction of a new network router in June, causing 250 million nation-wide calls and 23,621 emergency calls to 911 in several states not to connect. Vodafone in Germany experienced equipment failure that kept 100,000 mobile phone users from making calls for three hours in November.
Disruptions also hit users of the Internet. In May, users of the videoconferencing platform Zoom across the globe experienced trouble logging into their meetings for about two hours, messaging platform Slack suffered an outage for nearly three hours, while Adobe Creative Cloud users were locked out for most of a day. A configuration error in Internet service company Cloudfare’s backbone network disrupted world-wide online services for about an hour in July. Then in August, Internet service provider CenturyLink went down, taking dozens of online services and a big chunk of world-wide Internet traffic down with it, while in Australia, a DNS issue affected Telstra’s Internet service for a few hours. In September, a problem with Microsoft’s Azure Active Directory kept users in the North America from their Microsoft Office 365 accounts and other services for five hours, while in October, a network infrastructure update issue again caused difficulties for North American Microsoft Office 365 and other service users for over four hours. And in December, Google suffered consecutive day outages. The first was caused by an internal administrative system storage issue and affected more than a dozen Google services, including Docs, Gmail, Nest, YouTube and its cloud services for about an hour. The next day, Gmail services were down for up to four hours by an email configuration issue.
Social media companies suffered their own outages, like Spotify and Tinder (caused by a Facebook issue) in July, Twitter in February and again in October, as well as Facebook across Europe in December.
The number of records exposed by data breaches and especially unsecured databases continues to skyrocket, with at least 36 billion records exposed as of the end of September 2020. While the number of data breaches seems to have gone down, the number of large unsecured databases discovered seems to be climbing. StealthLabs has a comprehensive compilation of 25 major data breaches by month.
Ransomware attacks increased significantly in 2020, especially targeting governmental, educational and hospital systems. Typical were the attacks against the City of Pensacola, Florida, the University of Utah, and the University of Vermont Medical Center. Businesses have not been immune either, with ransomware woes plaguing the likes of electronic company Foxconn, hospital and healthcare services company Universal Health Services, and cybersecurity company Cygilant.
The U.S. Treasury Department’s Office of Foreign Assets Control issued a five-page advisory [PDF] in October warning against paying ransomware demands, stating that it not only encourages more attacks, but it also may run afoul of OFAC regulations and result in civil penalties. Whether the advisory has any impact remains to be seen. Delaware County, Pennsylvania agreed to pay a $500,000 ransom in December, for example.
Nation-state sponsored intrusions have also been prevalent in 2020, such as those against Israel and the UAE. The Russia-attributed “SolarWinds” attack against the U.S. that was initially disclosed in December and then developed into a bigger story has especially caused alarm, with the amount of damage still being unraveled.
In light of how often ransomware attacks are initiated by phishing emails, government agencies and corporations have increased their employee phishing-training, including the use of phishing tests using mock phishing emails and websites. These tests frequently use the same information contained in real phishing emails as a template in order to see how their employees respond. Unfortunately, some of these tests have backfired, causing undue panic or rage among employees as a result. Both Tribune Publishing Co. and GoDaddy recently found out about the latter when their tests were less than well thought out.
The year saw the continuation of bank outages in the UK beginning on New Year’s Day with millions of customers of Lloyds Banking Group unable to access online and mobile banking services. A few days later, computer problems at Clydesdale and Yorkshire banks kept wages and other payments from reaching customer accounts. Lloyds had another online problem in June, and other UK banks like Santander, NatWest, and Barclays experienced their own IT problems in late summer.
Other notable bank IT problems involved U.S. Chase Bank, where “technical issues” created incorrect customer balances in June and Nigerian First City Monument Bank, where up to 5.1 million customers had trouble accessing their online accounts for four days in July. Also in July, Australian Commonwealth Bank customers suffered a nine-hour online and banking outage, while National Australia Bank customers experienced a similar situation in October. A power outage at a data center took out India’s HDFC Bank, which interrupted its services for two days in November. HDFC’s November outage, along with previous incidents, caused the Reserve Bank of India in December to require HDFC to slow down its modernization efforts to ensure that its banking infrastructure was sufficiently reliable and resilient.
IT problems at stock exchanges and trading platforms have been especially abundant this past year. In February, a hardware error halted trading at the Toronto Stock Exchange for two hours in February, while a software issue caused the Moscow Exchange to suspend trading for 42 minutes in May. Then in July, stock exchanges in Frankfurt, Vienna, Ljubljana, Prague, Budapest, Zagreb, Malta and Sofia were offline for three hours because of a “technical issue” with the German electronic trading platform Xetra T7 system that each exchange used. In October, a technical issue in third-party middleware software was blamed for the trading halt on Euronext exchanges in Amsterdam, Brussels, Dublin, Lisbon and Portugal. The same month, a hardware failure and subsequent failure of the back-up system took down the Tokyo Stock Exchange for a whole day, the worst electronic outage ever experienced. The problems led to the resignation of TSE Chief Executive Officer Koichiro Miyahara. In November, a software issue caused trading to be suspended on the Australian Stock Exchange for nearly the entire day, its worst outage in more than a decade.
Trading platforms also experienced numerous IT problems. In March, the trading platform Robinhood faced, according to the company’s founders, “stress on our infrastructure.” That stress resulted in three outages in the space of one week, alongside others in June, August, November and December. J.P. Morgan endured a trading platform problem in March, while Charles Schwab, E-Trade, Fidelity, Merrill Lynch, TD Ameritrade, and Vanguard all had trading system technical issues of their own in November. Charles Schwab, Fidelity, TD Ameritade, and Interactive Brokers Group joined Robinhood with more outages in December as well.
The pandemic highlighted the dependence of governments everywhere on legacy IT systems, particularly in regard to state unemployment systems. The rapid increase in demand for unemployment benefits and the changes in the amount of benefits paid coupled with the inability to reprogram quickly the benefit systems affected unemployment systems in California, Oregon and Washington State especially hard. On the other hand, nearly every state experienced technical problems, including rampant fraud. Computer issues also affected the Internal Revenue Services ability to send out Congressional approved stimulus checks in April as well.
Legacy IT system worries did not just affect the United States. In February, Canadian Prime Minister Justin Trudeau received a report warning that many mission-critical systems were “rusting out and at risk of failure.” Japan’s government pledged in June to modernize its administrative systems, which were criticized for being “behind the world by at least 20 years.” South Korea’s government also promised in June to accelerate its transition to a digital economy.
While unemployment system problems dominated government system woes, there were others in the news as well. Pittsburgh’s new state of the art employee payroll system had an inauspicious start at the beginning of the year. Meanwhile, Ohio’s Cuyahoga County is still awaiting its new $35 million computer system, which is $10 million over budget and already two years late. It may be ready by 2022.
Pay issues involving the infamous Canadian Phoenix government payroll system that went live in 2016 continue to be resolved, with its replacement moving to early testing likely next year. Unfortunately, there still is no resolution to those tens of thousands of innocent unemployed Michigan workers falsely accused of employment fraud by Michigan’s Integrated Data Automated System (MiDAS) between October 2013 and September 2015. The state has been forcefully fighting without success to quash a class-action lawsuit for compensation; the case is now with the Michigan’s Supreme Court again for hopefully a final resolution in 2021. Finally, a review of Ohio’s $1.2 billion benefits system that went live in 2013 was still riddled with 1,100 defects and was partially responsible for up to $455 million in benefit overpayments and 24,000 backlog cases in the past year.
Medicine’s shift to electronic health records continues to be a bumpy one. In January, the UK government pledged to provide £40 million to streamline logging into National Health Service IT systems. Some staff reportedly must log into as many as 15 different systems each shift. Also in January, it was reported that half of the 23 million records in Australia’s controversial national My Health Record system contain no information, showing that its perceived benefits are still convincing to most Australian patients or practitioners. In March, a research paper [PDF] published in the Mayo Clinic Proceedings indicated that U.S. physicians rated the usability of their EHRs an “F”, and that poorly implemented EHRs were contributing to physician burnout.
In May, the U.K.’s National Audit Office reported that the now £8.1 billion IT modernization program being undertaken at the National Health Service is still a jumbled mess that hasn’t learned the lessons from its previous failure. Originally a £4.2 billion program in 2016 that promised a “paperless” NHS by 2020, the target date keeps getting pushed back, with a final cost likely to be much higher than currently projected. Additionally in May, a study published in JAMA Network Open that indicated hospital EHRs were failing to catch 33 percent of potentially harmful drug interactions and other medication errors, while in June, a study published in JAMA indicated more than 20 percent of patients were finding errors in their EHR notes.
In September, the U.S. Coast Guard began piloting its new EHR system that is based on the $4.4 billion Department of Defense Military Health System EHR effort called GENESIS that is planned to be fully deployed across DoD by 2024. The Coast Guard terminated its $67 million mismanaged EHR effort in 2018. In October, after a six month delay, the Department of Veteran Affairs finally rolled out its initial go-live EHR system at the Mann-Grandstaff Medical Center in Spokane, Washington. The $16.4 billion troubled EHR modernization project is scheduled to complete in 2028, although delays and increased costs are likely over the next 7 plus years.
Finally, in December, a briefing note written in October to Saskatchewan's Minister of Health was made public by the Canadian Broadcasting Corporation warning that the province’s healthcare IT system was at growing risk of failure because of chronic underfunding. "A major equipment failure which may disrupt service and risk lives appears inevitable with the current funding model,” the note warned. When asked to comment about the note, Health Minister Paul Merriman said he “will be asking Ministry of Health officials to look into this matter and to find ways to improve the systems supported by eHealth.” Why the Minister did not ask in October when he received the note was not explained.
Issues with automated facial recognition (AFR) continue to dog law enforcement. In wake of social unrest in the U.S. and ongoing worries over AFR bias, Microsoft and Amazon announced in June that they would suspend selling face recognition software to police departments. IBM went one step further and announced in June that it would no longer work on the technology at all. In August, the use of AFR by British police was ruled unlawful by a Court of Appeals until the government officially approves its use.
Along with the push against the use of AFR, there has been a backlash against the use of predictive policing software. For example, New Orleans, Louisiana and Los Angeles, Oakland and Santa Cruz, California have all moved to prohibit the use of predictive policing systems.
In December, the Massachusetts State Police announced that it would suspend using its automatic license plate readers until a time and date glitch found to be affecting five years-worth of data was corrected.
There were other police IT issues in the U.K. as well. In January, it was revealed that an error in the City of London Police’s new crime reporting service that was launched in 2018 kept information on over 300,000 fraud crime reports from being shared by the National Fraud Database with the London Police for 15-months. The database is used by major banks, financial institutions, law enforcement and government organizations to share information about fraud and to help the police with their criminal investigations. Then in October, the U.K.’s Police National Computer experienced a 10-hour outage blamed on a “human error,” with one senior police official saying the outage had caused “absolute chaos” across the country’s police forces.
Troubles with iOPS—the late, costly, and controversial new computer system installed by the Greater Manchester Police in the U.K. in late 2019—also persisted unabated throughout the year. The latest crash occurred just last month. Operational difficulties with the system have been linked to a staggering inability by the GMP to record accurate crime data as well.
Few train or subway IT-related problems were reported this past year. In July, computer and other issues were reported to continue to plague Ottawa’s light rail transit system, while in September, service on the San Francisco area BART system was shut down for about four hours because of a failure of “one of a dozen field network devices.”
The biggest news was that after 12-years, 41 U.S. freight and passenger railroads have met (with two-days to spare) the federal mandate for deploying positive train control to prevent train accidents, such as train-to-train collisions, derailments caused by excessive train speed, train movements through misaligned track switches, and unauthorized train entry into work zones. Vehicle-train and track or equipment failures can still cause train accidents, however. The original deadline was the end of 2015, but that was date shifted back five years as it became clear most railroads would not be able to meet the mandate.
Finally, a note on what did not seem to happen. Typically, every year there are several memorable IT project failures or cancellations or other major dumpster fires. However, for all its legendary failures and disappointments, 2020 is marked by a dearth of this particular breed of IT catastrophe. There was Australian government’s Visa Processing Platform outsourcing plan failure that cost AU$92 million, the decision by Nacogdoches (Texas) Memorial Hospital to terminate its $20 million EHR contract for Cerner’s Community Works platform, and the Cyberpunk 2077 launch fiasco, but there have been few others that made the news. Whether there indeed were fewer IT project failures (and maybe more successes?), or just fewer reported, should be clearer a year from now at our next review.
Robert N. Charette is a Contributing Editor to IEEE Spectrum and an acknowledged international authority on information technology and systems risk management. A self-described “risk ecologist,” he is interested in the intersections of business, political, technological, and societal risks. Charette is an award-winning author of multiple books and numerous articles on the subjects of risk management, project and program management, innovation, and entrepreneurship. A Life Senior Member of the IEEE, Charette was a recipient of the IEEE Computer Society’s Golden Core Award in 2008.