Risk Factor iconRisk Factor

Remembering the Technology Glitches and Failures of Tax Years Past

We're pleased to note that on 1 April, IEEE Spectrum won the Jesse H. Neal Best Infographics Award for our series "Lessons from a Decade of Failures." To celebrate, it seemed like a good time to once again take a dive into the Risk Factor archives and search for additional historical lessons. Because we're nearing the end of tax season here in the United States, I decided to examine the often volatile combination of tax policy and IT systems. Tax-related problems are some of the most painful IT failures, because they tend to hit citizens right where it hurts most: their bank accounts.

Below you'll find some of the most noteworthy operational glitches of the past decade, but as with previous timelines, the incidents listed here are merely the tip of the iceberg, and should be veiwed as being representative of tax-related IT problems rather than comprehensive. It doesn't even include incidents of tech-assited fraud, data breaches, or failed modernization projects (like the cancellation of the My IRS Account Project). It's not always easy to identify the exact impact of tax related glitches: in some cases it's easier to measure the number of people affected, while in others, the monetary cost is more straightforward. Use the dropdown menus to navigate to other incidents that might be hidden in the default view.

In reviewing this list of failures, a few of lessons jumped out at me:

  • For ongoing, excruciating, cringe-worthy tax-tech pain, no one beats Her Majesty's Revenue and CustomsAs my colleague Bob Charette has chronicled, the multiyear rollout of the Pay-As-You-Earn computerized tax system is a textbook case of technological and bureaucratic hubris in the face of a challenging IT problem. You can see from the timeline the magnitude of people affected by calculation errors, which grew over time.
  • Data validation, verification, and sanity checks remain poor. Increasing computerization has meant an increase in mistakes that should have been caught by common sense. Tax systems need better safety checks and governments need to be more skeptical of sudden, unexpected windfalls.
  • Don't automatically trust automatically generated notices. It seems like the software in tax systems that generates letters and notices is subject to even less scrutiny and oversight than the rest of the systems' components.
  • There's danger in waiting to file your taxes at the last minute, but doing them early can also cause problemsThere are many examples of tax services simply being unprepared to process early returns, whether because of last-minute changes to the tax code, or from data that has not yet been updated.

Clearly there are lots of advantages to digitizing tax calculation and collection, including efficiency and accuracy. But it's worth keeping in mind that in all likelihood, our IT systems are bound to fail occasionally, so we need to make sure our laws and systems are better prepared for those contingencies. In the past decade, our ability to cause harm with tax systems has often outpaced our ability to make things right.

If there's a notable tax-related glitch you'd like to see represented on the timeline, let me know in the comments, and I'll try to add it.

We Need Better IT Project Failure Post-Mortems

In pulling together this special interactive report on a decade’s worth of IT development projects and operational failures, the most vexing aspect of our efforts was finding trustworthy data on the failures themselves.

We initially started with a much larger set than the 200 or so projects depicted in this report, but the project failure pool quickly shrank as we tried to get reliably documented, quantifiable information explaining what had occurred, when and why, who was affected, and most importantly, what the various economic and social impacts were.

This was true not only for commercial IT project failures—which one would expect, given that corporations are extremely reticent to advertise their misfortunes in detail if at all—but also for government IT project failures. Numerous times, we reviewed government audit reports and found that a single agency had inexplicably used different data for a project’s initial and subsequent costs, as well as for its schedule and functional objectives. This project information volatility made getting an accurate, complete, and consistent picture of what truly happened on a project problematic to say the least.

Our favorite poster child for a lack of transparency regarding a project’s failure is the ill-fated $1 billion U.S. Air Force Expeditionary Combat Support System (ECSS) program (although the botched rollout of HealthCare.gov is a strong second).  Even after multiple government audits, including a six-month, bi-partisan Senate Armed Services Committee investigation into the high-profile fiasco, the full extent of what this seven-year misadventure in project management was trying to accomplish could not be uncovered. Nor could the final cost to the taxpayer be ascertained.

With that in mind, we make our plea to project assessors and auditors asking that they apply a couple of lessons learned the hard way over the past decade of IT development project and operational failures: 

In future assessments or audit reports of IT development projects, would you please publish with each one a very simple chart or timeline? It should show, at a glance: an IT project’s start date (i.e., the time money is first spent on the project); a list of the top three to five functional objectives the project is trying to accomplish; and the predicted versus actual cost, date of completion, and delivered functionality at critical milestones where the project being reviewed, delivered, or canceled.

Further, if the project has been extended, re-scoped or reset, please make the details of such a change absolutely clear. Don’t forget to indicate how this deviation affects any of the aforementioned statistics. Finally, if the project has been canceled, account for the opportunity costs in the final cost accounting. For example, the failure of ECSS is currently costing the Air Force billions of dollars annually because of the continuing need to maintain legacy systems that should have been retired by now. You’d think that this type of project status information would be routinely available. But unfortunately, it is rarely published in its totality; when it is, it’s even less likely to be found all in one place.

Similarly, for records related to IT system operational failures, would you please include all of the consequences being felt—not only financially but to the users of the system, both internally and externally?  Too often an operational failure is dismissed as just a “teething problem,” when it feels more like a “root canal” to the people dependent upon the system working properly. 

A good illustration is Ontario’s C$242 million Social Assistance Management System (SAMS), which was released more than a year ago, and it is still not working properly. The provincial government remains upbeat and positive about the system’s operation while callously downplaying the impact of a malfunctioning system on the poor in the province.

More than 100 years ago, U.S. Supreme Court Judge Louis Brandeis argued that, “Publicity is justly commended as a remedy for social and industrial diseases. Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.” Hopefully, the little bit of publicity we have tried to bring to this past decade of IT project failures will help to reduce their number in the future.

Wishful Thinking Plagues IT Project Plans

“An extraordinary failure in leadership,” a “masterclass in sloppy project management,” and a “test case in maladministration” were a few of the more colorful descriptions of UK government IT failures made by MP Edward Leigh, MP for Gainsborough, England when he was the Chairman of the Public Accounts Committee.

Leigh repeatedly pointed that government departments consistently were not only wholly unrealistic in their IT project costs, schedule and technical feasibility, but also didn’t take any responsibility for the consequences of these assumptions.

Read More

Sorry for the Inconvenience

After looking back at the project failures chronicled in the Risk Factor for our recent set of interactive features “Lessons From a Decade of IT Failures,” I became intrigued by the formulaic apologies that organizations put out in their press releases when something untoward happens involving their IT systems.  For instance, below are three typical IT mea culpas:

“We would like to apologize for the inconvenience this has caused.”

“We regret any inconvenience this may have caused patients.”

“We apologize for the inconvenience to those affected.”

The first apology came about as a result of the Nationwide, the UK’s largest building society, charging 704,426 customers’ debit card twice for the same transaction, which it blamed on a batch file that was doubly processed. The second apology was in response to the crash of Northern California’s Sutter’s Health $1 billion electronic health record system crashing for more than a day across seven of its major medical facilities because of a faulty system upgrade. The last apology was prompted by the shambolic mess that marked the rollout of California’s EDD unemployment system that affected at least 185,000 claimants, many of them for weeks.

Apparently, regardless of the impact or duration of an IT failure, it is viewed by the offending organization as being merely an inconvenience to those experiencing them. Ontario’s Social Services Minister Helena Jaczek went so far as to liken the months of havoc resulting from the botched rollout of a new provincial CN$240 million welfare and disability system as reminding her “of when I have my BlackBerry telling me that there are issues and I need an update. . . it is often very inconvenient.”

Hmms, not receiving a desperately needed disability or subsistence check equates to a Blackberry software update. Who knew?

Most apologetic public statements by corporate or government officials at least attempt to provide the pretense that executive management feels bad for the consequences of their organization’s IT oofta. However, there have been other times where perhaps not apologizing would have been a better strategy than the apology given. Below are a few of my favorite “best of the worst apologies” from the Risk Factor blog files which clearly indicate that the organization would have been a lot happier if it didn’t have to deal with those pesky customers or taxpayers.

We start off with two of the largest organizations in Ireland that worked overtime to antagonize their respective customers in the wake of billing system errors. The first is Irish Rail, which discovered that a software upgrade to its vending machines caused tickets to be issued to some 9,000 passengers without actually deducting the fare amounts from their debit cards for several days. On the Friday, a week after the discrepancy was discovered, Irish Rail decided to notify the affected customers by way of a press release on its website, which mentioned that it would be deducting the fare amounts due it beginning on the following Monday.

Irish Rail’s press release also stated, “We apologies [sic] for any inconvenience this fault causes customers,” which for many could include incurring hefty penalty charges for unauthorized overdrafts on their debit cards. When asked why it couldn’t wait a week so its customers could ensure that their accounts had the funds to cover the charges, Irish Rail responded it had every right to collect the money immediately and was going to do so. Unsurprisingly, the affected Irish Rail customers didn’t think much of the company’s apology.

Another “show me the money” demand disguised as an apology came from Eircom, the largest telecom provider in the Republic of Ireland. It, like Irish Rail, had a billing “system error” that did not directly debit some 30,000 customer bank accounts correctly, even though its customers’ bills indicated otherwise. Eircom deemed the incident “regrettable” and further stated that “it’s embarrassing and we're very sorry that it's happened.” However, Eircom was neither too embarrassed nor sorry enough to insist that although it planned to reimburse customers the failed direct debit fee charge of €18.45, customers would still have to pay all monies owed the telecom in their next billing cycle. Ireland’s telecom regulator was as unhappy with Eircom’s payment demand as its customers were, even more so because the utility also failed to inform it of the billing error.

“Teething issues” also featured prominently in several apologies for IT fouls ups. Take, for instance, EnergyAustralia, which claimed that on-going “teething problems” with its newly introduced accounting system were why 145,000 of its customers had not been billed for their electricity or gas usage on time, including 21,000 who had never received a bill from the utility. In the apology the company issued, it tried to downplay the extent of the foul-up by saying, “We are sorry to the small number of customers who haven't had the best experience and we're working round the clock to improve our service.” However, for some 12,500 EnergyAustralia customers, the clock spun around for more a year before their billing issues were finally corrected.

Automotive manufacturer McLaren also apologized that its MP4-12C $229,000 supercar was suffering from “teething problems.” Ron Dennis, the executive chairman of McLaren Automotive and McLaren Group, sent out a letter to customers that stated in part, “As you will have already heard from my staff, we are experiencing some early software bugs resulting in unnecessarily sensitive warning lights, battery drainage in certain conditions and IRIS [infotainment] performance issues. My team and the McLaren retailers are working with the pace and intensity that the McLaren brand demands to fully resolve these bugs rapidly and effectively to ensure that any inconvenience to you is kept to a minimum.” Dennis, however, tried to make up for the inconvenience by promising customers that he was going to give them “a pre-release copy of the new McLaren: The Wins coffee-table book." I wonder how many software bugs it would take to get a personally signed copy of the book.

Additionally, there were a couple of organizations that had to make so many apologies that customers just stopped listening to them, deciding  to head for the exits instead. Take, for example, UK’s RBS Group which had a major system meltdown in the summer of 2012 caused by a routine software-update gone bad that kept millions of its bank customers from accessing their accounts for days, and some even for months. At the time, then Chairman Stephen Hester apologized, saying, “Our customers rely on us day in and day out to get things right. On this occasion we have let them down… Once again I am very sorry for the inconvenience.”

Various RBS Group spokespersons had to apologize several more times that summer as they promised everything would soon be made right, which quickly turned out not to be true.  At the time, RBS promised to invest hundreds of millions of pounds into upgrading its IT systems to keep major disruptions from happening again.

However, RBS has suffered significant glitches since, including in December 2013 on Cyber Monday and once more in June of this year. Although after each incident RBS management stated that it was “sorry for the inconvenienced caused” and that the incident was “unacceptable,” tens of thousands of its discomforted customers have decided to do their banking elsewhere.

While RBS may have seen droves of customers desert it over its IT failures, it is nothing compared to Australia’s telecom company Vodafone which has seen millions of its customers leave because of the company’s persistent IT ooftas. The root of the problem can be traced to 2009, when Vodafone merged its network with rival Hutchinson’s “3” network.  Not surprisingly, the merger’s objective of creating a high-quality, unified network across Australia wasn’t as easy or seamless as envisioned. Customer complaints about poor Vodafone service grew throughout 2010, but really came to a head when a network software upgrade in late 2010 didn’t work as expected. Instead of speeding up network traffic, the upgrade slowed it down. That problem took weeks to fix, angering legions of Vodafone customers. 

Then a different, concurrent software issue caused additional problems across the network. Vodafone, which by now was being referred to in the press as “Vodafail,” had to apologize to its angry customers multiple times that the company was “truly sorry” for the continued “dropped calls, delayed SMS and voicemails, slow data speeds, inconsistent coverage, and long waits when you called us.” For more than 2 million Vodafone fed-up customers who left the company between 2010 and 2013, the company didn’t improve fast enough. Finally, after spending AU$3 billion to upgrade its networks and customer support, Vodafone Australia announced earlier this year that it had started adding customers again.

There was also an interesting apologetic non-apology that happened in my home state of Virginia. In the summer of 2010, a server problem at the Virginia Information Technologies Agency (VITA) knocked out the IT systems used by 27 of Virginia’s 89 state agencies for several days, and a number of agencies were affected for over a week. At the time, the state’s IT infrastructure was in the midst of a $2.3 billion upgrade which was a constant source of contention between Virginia and its contractor Northrup Grumman.

When the server problem was finally fixed, Northrop Grumman vice president Linda Mills put out the expected pabulum and said the company “deeply regrets the disruption and inconvenience this has caused state agencies and Virginia citizens.” However, Grumman’s “regrets” were immediately undercut by a company spokesperson who, when asked by a Richmond Times-Dispatch newspaper reporter whether Mills’ statement was an apology, declined to comment. Whatever little goodwill that was left in state government for Northrop Grumman quickly vanished.

In May of 2011, after an investigation into the outage, NG agreed to pay a $4.7 million fine for the outage, which is an apology of the best kind, in my opinion.

Our final apology was given through firmly clenched teeth. For years, Her Majesty's Revenue and Customs (HMRC) in the UK worked to upgrade (pdf) its troubled computer systems. HMRC promised that when complete, the new PAYE (pay as you earn) system would significantly reduce both over- and underpayments of taxes. When it was fully introduced in 2010, the HMRC announced that some 4.3 million UK taxpayers were going to receive letters stating that they had paid on average £400 too much in taxes between 2008 and April of 2010. Additionally, another 1.5 million would be receiving letters just before Christmas that they had paid on average £1,428 too little over the same period, and HMRC wanted its money now. Furthermore, the HMRC indicated that another 6 million taxpayers prior to 2008 were likely owned money for taxes paid previous to 2008, and another 1.7 million possibly owed more taxes. This group would be receiving letters soon, too.

The underlying reason for the millions of over-and under-payments was that taxpayers were being placed in an incorrect tax bracket for years because of errors in the new HMRC PAYE computer system database.

Needless to say, the UK public was not a happy bunch at the news. Fueling their unhappiness was the attitude of HMRC Permanent Secretary Dave Harnett, who stated in a radio interview there was no need for him or his department to apologize for the errors or demand for quick payment of owed taxes, because at least to him, the PAYE system was working as designed: “I'm not sure I see a need to apologise... We didn’t get it wrong.”

Politicians of both parties were appalled by that statement, calling Harnett out of touch and arrogant, especially in light of all the reported PAYE system foul-ups. Harnett was forced by senior Conservative party leaders to retract his statement the following day, saying that he was “deeply sorry that people are facing an unexpected bill.”

A few days later, however, HMRC CEO Dame Lesley Strathie made it very clear Harnett’s apology was really a non-apology when she insisted  that HMRC staff made “no mistakes,” and any and all errors were due to taxpayer mistakes. Dame Strathie also said the critiques of HMRC's performance was unfair since it couldn't pick and choose which customers to serve—it had to deal with everyone—whether her government department liked it or not. That bit of insight didn’t go over well, either.

HMRC’s PAYE system continues to cause grief to UK taxpayers, and the HMRC is taking yet another crack at updating its computer systems. Unfortunately, UK taxpayers don’t have a choice of which tax department to use, like customers of RBS or Vodafone do with banks or telecom companies. 

If you have some “worst IT foul-up apologies of all time” stories to add, please let me know.

The Making of "Lessons From a Decade of IT Failures"

In the fall of 2005, IEEE Spectrum published a special report on software that explored the question of why software fails and possible ways to prevent such failures. Not long afterwards, we started this blog, which documents IT project and operational failures, ooftas, and other technical hitches from the around the world.

The tenth anniversary of that report seems to be appropriate time to step back, look through 1750 blog posts, and give an overall impression of what has and hasn’t changed vis a vis software crashes and glitches. Obviously, the risk in both the frequency and cost of cybercrime is one major change, but we decided at least for now to concentrate our limited resources on past unintended IT system development and operations failure.

Read More

Fuzzy Math Obscures Pentagon's Cybersecurity Spending

U.S. military spending has increasingly focused on cybersecurity in recent years. But some fuzzy math and the fact that funding is spread out among many military services makes it tough to figure out exactly how much money is going toward cybersecurity. That in turn makes it difficult to understand whether each dollar spent really improves the U.S. military’s cyber capabilities.

Read More

Is the Lenovo/Superfish Debacle a Call to Arms for Hacktivists?

As Lenovo has come under fire for pre-installing on their computers the intrusive Superfish adware — and as lawsuits are now being filed against the laptop-maker for compromising its users’ security — one solution to the problem may have been given short shrift. Maybe it’s time, in other words, to release the hackers.

To be clear, nothing here should be read as an inducement to any sort of malicious hacking or other nefarious cyber-activities. The call to arms is instead to hacking in the old Homebrew Computer Club, touch-of-code-and-dab-of-solder sense. After all, when pop-up ads became a scourge of the late 1990s Internet, coders behind the smaller Opera and Mozilla browsers rolled out their pop-up blockers to restore a touch of sanity. Major commercial web browsers like Internet Explorer and Safari only rushed in after the nimbler first responders proved the consumer demand.

Read More

Should Data Sharing Be More Like Gambling?

When you install a new app on your phone, you might find yourself facing a laundry list of things the software says it needs to access: your photos folder, for example, along with your camera, address book, phone log, and GPS location.

In many cases, it’s an all or nothing deal. 

Eric Horvitz of Microsoft Research says companies could do better. Instead of asking users to provide wholesale access to their data, they could instead ask users to accept a certain level of risk that any given piece of data might be taken and used to, say, improve a product or better target ads.

“Certainly user data is the hot commodity of our time,” Horvitz said earlier this week at the American Association for the Advancement of Science, or AAAS, meeting in San Jose. But there is no reason, he says, that services “should be sucking up data willy-nilly.”

Instead, he says, companies could borrow a page from the medical realm and look for a minimally invasive option. Horvitz and his colleagues call their approach “stochastic privacy.” Instead of choosing to share or not to share certain information, a user would instead sign on to accept a certain amount of privacy risk: a 1 in 30,000 chance, for example, that their GPS data might be fed into real-time traffic analysis on any given day. Or a 1 in 100,000 chance that any given Internet search query might be logged and used.

Horvitz and colleagues outlined the approach in a paper for an American Association for the Advancement of Artificial Intelligence conference last year.

If companies were to implement stochastic privacy, they’d likely need to engage in some cost-benefit calculations. What are the benefits of knowing certain information? And how willing would a user be to share that information? 

This sort of exercise can turn up surprising results. In an earlier study, Horvig and Andreas Krause (then at Caltech, but now at ETH Zurich) surveyed Internet search users to gauge their sensitivity to sharing different kinds of information. More sensitive than marital status, occupation, or whether you have children? Whether the search was conducted during work hours. 

Of course, even if a company works out what seem to be reasonable risks for sharing different kinds of data, what it might look like on the user end is still an open question. How do you communicate the difference between a 1/30,000 and a 1/100,000 probability? 

Horvitz said that would be a good problem to have. “Would you want to live in a world where the challenge is to explain these things better,” he asked, “or where companies scarf up everything?”

Rooting Out Malware With a Side-Channel Chip Defense System

The world of malware has been turned on its head this week, as a company in Virginia has introduced a new cybersecurity technology that at first glance looks more like a classic cyberattack. 

The idea hatched by PFP Cybersecurity of Vienna, Va., is taken from the playbook of a famous cryptography-breaking scheme called the side channel attack. All malware, no matter the details of its code, authorship, or execution, must consume power. And, as PFP has found, the signature of malware’s power usage looks very different from the baseline power draw of a chip’s standard operations.

So this week, PFP is announcing a two-pronged technology (called P2Scan and eMonitor) that physically sits outside the CPU and sniffs the chip’s electromagnetic leakage for telltale signatures of power consumption patterns indicating abnormal behavior.

The result, they say, is a practically undetectable, all-purpose malware discovery protocol, especially for low-level systems that follow a predictable and standard routine. (Computers with users regularly attached to them, like laptops and smartphones, often have no baseline routine from which abnormal behavior can be inferred. So, PFP officials say, their technology is at the moment better suited to things like routers, networks, power grids, critical infrastructure, and other more automated systems.)

“On average, malware exists on a system for 229 days before anyone ever notices anything is there,” Thurston Brooks, PFP’s vice president of engineering and product marketing told IEEE Spectrum. “What’s really cool about our system is we tell you within milliseconds that something has happened.”

PFP—an acronym for “power fingerprinting”—requires that its users establish a firm baseline of normal operations for the chips the company will be monitoring. So they begin with P2Scan, a credit-card-size physical sensor that monitors a given chip, board, device, embedded system, or network router for its electromagnetic fingerprints when running normally.

Unlike most malware strategies in the marketplace today, PFP takes a strikingly software-agnostic tack to besting malware, hardware Trojans, and other cyberattacks.

“We’re not trying to actually understand what’s going on inside the machine, like the hackers are,” says Brooks. “We’re trying to define what normal behavior looks like. Then, knowing [that], we can detect abnormal behavior.”

The view of malware as seen from outside the chip, in other words, can be a refreshing one. Hackers can’t detect this type of surveillance, because the scanning tools never actually interact with the chip’s operations. And hackers can be as clever as the most sophisticated programmers in the world. Yet, their code will still very likely be detected because, simply by virtue of performing different tasks than the chip normally performs, it will have a different power profile.

“I am a signal processing guy,” says PFP president Jeff Reed, who is also a professor in the ECE department at Virginia Tech. “Our approach is a very different approach than a person who’s normally schooled in security…We’re trying to understand a disturbance in the signal due to the inclusion of malware.”

Reed and Brooks also point out that counterfeit chips are a vast problem in IT, as Spectrum has documented in recent years. By the FBI’s estimates, for instance, chip counterfeiting costs U.S. businesses some $200 to $250 billion annually.

The problem is just as daunting for the U.S. military, as Spectrum has also chronicled. For example, an investigation by the U.S. Senate Committee on Armed Services uncovered counterfeit components in the supply chains for the CH-46 Sea Knight helicopter, C-17 military transport aircraft, P-8A Poseidon sub hunter and F-16 fighter jet.

The problems were expensive but ultimately rooted out. Yet other dangers remain—especially in such high-security realms, where substandard components could endanger troops and missions, or compromised chips could be used to carry out malicious plots.

But any compromised chip—whether hardware-Trojan-laden or part of a single lot of subpar chips coming from the foundry—can be discovered using their system, PFP says.

The trick, says Brooks, is to grab a sample chip from a lot and perform a (typically expensive) decapping, x-ray analysis, and reverse-engineering of the chip’s code. Then, once it’s been confirmed that the chip works as designed and is within spec, it is run through a standard operation, providing an electromagnetic baseline for P2Scan and eMonitor.

Every other chip in the lot can then be rapidly and cheaply tested against the “gold standard” chip by running the same standard operation and comparing the resulting electromagnetic signature to that of the first chip.

“You determine whether you have a good chip or not,” Brooks says. “You only spend the money to do that on one chip…So you amortize the cost of the forensics across all the chips. So if you have a few million chips, you’re talking about a few pennies [per chip] to do the whole thing—and know that you have a million chips that are all good.”


Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Robert Charette
Spotsylvania, Va.
Willie D. Jones
New York City
Load More