Risk Factor iconRisk Factor

Blob Front-End Bug Bursts Microsoft Azure Cloud

IT Hiccups of the Week

It being the Thanksgiving holiday week in the United States, I was tempted to write once more about the LA Unified School District’s MiSiS turkey of a project, which the LAUSD Inspector General fully adressed a report [pdf] released last week. If you like your IT turkey burnt to a crisp, over-stuffed with project management arrogance, served with heapings of senior management incompetence, and topped off a ladleful of lumpy gravy of technical ineptitude, you’ll feast mightily on the IG report. However, if you are a parent of the over 1,000 LAUSD school district students who still have not received a class schedule nearly 40 percent of the way into the academic year—or a Los Angeles taxpayer for that matter—you may get extreme indigestion from reading it.

However, the winner of the latest IT Hiccup of the Week award goes to Microsoft for the intermittent outages that hit its Azure cloud platform last Wednesday, disrupting an untold number of customer websites along with Microsoft Office 365,  Xbox Live , and other services across the United States, Europe, Japan, and Asia. The outages occurred over an 11-hour (and in some cases longer) period.

According a detailed post by Microsoft Azure corporate vice president Jason Zanderon, the outage was caused by “a bug that got triggered when a configuration change in the Azure Storage Front End component was made, resulting in the inability of the Blob [Binary Large Object] Front-Ends to take traffic.”

The configuration change was made as part of a “performance update” to Azure Storage, that when made, exposed the bug, and “resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services.” The bug, which had escaped detection during “several weeks of testing,” caused the storage Blob Front-Ends to go into an infinite loop, Zander stated. “The net result,” he wrote, “was an inability for the front ends to take on further traffic, which in turn caused other services built on top to experience issues.”

Once the error was detected, the configuration change was rolled backed immediately. However, the Blob Front-Ends needed a restart to halt their infinite looping, which slowed the recovery time, Zander wrote.

The effects of the bug could have been contained, except that Zander indicated someone apparently didn’t follow standard procedure in rolling out the performance update.

“Unfortunately the issue was wide spread, since the update was made across most regions in a short period of time due to operational error, instead of following the standard protocol of applying production changes in incremental batches.”

Zander apologized for the “inconvenience” and says that it is going to “closely examine what went wrong and ensure it never happens again.”

In Other News…

Polish President Says Voting Glitch Doesn’t Warrant Vote Rerun

RBS Hit With £56 Million Fine for “Unacceptable” 2012 IT Meltdown

Wal-Mart Ad Match Scammed for $90 PS4s

Computer Problems Close South Australian Government Customer Service Centers

British Columbia Slot Machines’ Software Fixed After Mistaken $100K Payout

Washington State Temporarily Closes Health Exchange Due to Computer Issues

Software Bug in Washington State Department of Licensing Fails to Alert Drivers to Renew Licenses

RBS Group Facing Huge Fine over Massive 2012 IT System Meltdown

IT Hiccups of the Week

We turn our attention in this week’s IT Hiccups to one of the truly major IT ooftas of the past decade—one that was back in the news this week: the meltdown of the IT systems supporting the RBS banking group. (That group includes NatWest, Northern Ireland’s Ulster Bank, and the Royal Bank of Scotland.) The meltdown began in June 2012 but wasn’t fully resolved until nearly two months later. The collapse kept 17 million of the Group’s customers from accessing their accounts for a week, while thousands of customers at Ulster Bank reported access issues for more than six weeks.

Last week, Sky News reported that the UK’s Financial Conduct Authority (FCA) informed RBS that it was facing record breaking fines in the “tens of millions of pounds” for the malfunction, which was blamed on a faulty software upgrade. In addition, the Sky News story states that the Central Bank of Ireland is looking at imposing fines on Ulster Bank over the same issue.  The meltdown has already cost RBS some £175 million in compensation and other corrective costs.

Read More

FCC Chairman Calls April's Seven State Sunny Day 911 Outage "Terrifying"

IT Hiccups of the Week

This edition of IT Hiccups of the Week revisits the 911 emergency call system outages that affected all of Washington State and parts of Oregon just before midnight, 9 April 2014. As I wrote at the time, CenturyLink—a telecom provider from Louisiana that is contracted by Washington State and the three affected counties in Oregon to provide 911 communication services—blamed the outages, which lasted several hours each, on a “technical error by a third party vendor.”

CenturyLink gave few details in the aftermath of the outages other than to say that the Washington State and Oregon outages were merely an “uncanny” coincidence, and to send out the standard “sorry for the inconvenience” press release apology. The company estimated that approximately 4,500 emergency calls to 911 call centers went unanswered during the course of the Washington State outage. No details were available regarding the number of failed 911 calls there were during the two-hour Oregon outage, which affected some 16,000 phone customers.

Well, 10 days ago, the U.S. Federal Communications Commission released its investigative report into the emergency system outages. It cast a much different light on the Washington State “sunny day” outage (i.e., not caused by bad weather or a natural disaster) that CenturyLink initially tried to play down. FCC Chairman Tom Wheeler even went so far as to call the report’s findings “terrifying.”

As it turns out, while the 911 system outages that hit Oregon and Washington State were indeed coincidental, they were also connected in a strange sort of way that caused a lot of confusion at the time, as we will shortly see. More importantly, the 911 outage that affected Washington State on that April night didn’t just affect that state, but also emergency calls being made in California, Florida, Minnesota, North Carolina, Pennsylvania and South Carolina. In total, some 6,600 emergency calls made over a course of six hours across the seven states went unanswered.

As the FCC report notes, because of the multi-state emergency system outage, “Over 11 million Americans … or about three and half percent of the population of the United States, were at risk of not being able to reach emergency help through 911.” Since the outage happened very late at night into the early morning and there was no severe weather in the affected regions, the emergency call volume was very low; luckily, no one died because of their inability to reach 911.

The cause of the outage, the FCC says, was a preventable “software coding error” in a 911 Emergency Call Management Center (ECMC) automated system in Englewood, Colorado, operated by Intrado, a subsidiary of West Corporation.  Intrado, the FCC report states, “is a provider of 911 and emergency communications infrastructure, systems, and services to communications service providers and to state and local public safety agencies throughout the United States…  Intrado provides some level of 911 function for over 3,000 of the nation’s approximately 6,000 PSAPs .”

As succintly explained in an article in the Washington Post, “Intrado owns and operates a routing service, taking in 911 calls and directing them to the most appropriate public safety answering point, or PSAP, in industry parlance. Ordinarily, Intrado's automated system assigns a unique identifying code to each incoming call before passing it on—a method of keeping track of phone calls as they move through the system.”

“But on April 9, the software responsible for assigning the codes maxed out at a pre-set limit [at 11:54 p.m. PDT]; the counter literally stopped counting at 40 million calls. As a result, the routing system stopped accepting new calls, leading to a bottleneck and a series of cascading failures elsewhere in the 911 infrastructure,” the Post article went on to state.

All told, 81 PSAPs across the seven states were unable to receive calls; dialers to 911 heard only “fast busy” signals.

When the software hit its 40 million call limit, the FCC report says, the emergency call-routing system did not send out an operator alarm for over an hour. When it finally did, the system monitoring software indicated that the problem was a “low level” problem; surprisingly, it did not immediately alert anyone that emergency calls were no longer being processed. 

As a result, Intrado’s emergency call management center personnel did not realize the severity of the outage, nor did they get any insight into its cause, the FCC report goes on to state. In addition, the ECMC personnel were already distracted with alarms they were receiving involving the Oregon outage also involving Century link.

Worse still, says the FCC, the low-level alarm designation not only failed to get ECMC personnel’s attention, but it also prevented an automatic rerouting of 911 emergency calls to Intrado’s ECMC facility in Miami.

It wasn’t until 2:00 a.m. PDT on 10 April that ECMC personnel became aware of the outage. That, it seems, happened only because CenturyLink called to alert them that its PSAPs in Washington State were complaining of an outage. After the emergency call management center personnel received the CenturyLink call, both they and CenturyLink thought the Washington State and Oregon outages were somehow closely interconnected. It took several hours for them to realize that they were entirely separate and unrelated events, the FCC report states. Apparently, it wasn’t until other several other states’ PSAPs and 911 emergency system call providers started complaining of outages that call management center personnel and CenturyLink realized the true scope of the 911 call outage, and were finally able zero in on the cause.

Once the root cause was discovered, the Colorado-based ECMC personnel initiated a manual failover of 911 call traffic to Intrado’s ECMC Miami site at 6:00 a.m. PDT. When problems plaguing the Colorado site were fixed later that morning, traffic was rerouted back.

The FCC report states that, “What is most troubling is that this is not an isolated incident or an act of nature. So-called ‘sunny day’ outages are on the rise. That’s because, as 911 has evolved into a system that is more technologically advanced, the interaction of new [Next Generation 911 (NG911)] and old [traditional circuit-switched time division multiplexing (TDM)] systems is introducing fragility into the communications system that is more important in times of dire need.”

IEEE Spectrum published an article in March of this year that explains the evolution of 911 in the U.S. (and Europe) and provides good insights into some of the difficulties of transitioning to NG911. The FCC’s report also goes into some detail on how the transition from traditional 911 service to NG911 can create subtle problems that are difficult to unravel when a problem does occur.

According to a story at Telecompetitor.com, Rear Admiral David Simpson, chief of the FCC’s Public Safety and Homeland Security Bureau, told the FCC during a hearing into the outage that there were three additional major “sunny day” outages in 2014, though none were ever reported before this year. All three—which I believe involved outages in Hawaii, Vermont and Indiana—involved NG911 implementations or time division multiplexing–to-IP transitions, Simpson said.

The FCC report indicates that Intrado has made changes to its call routing software and monitoring systems to prevent this situation from happening again, but it also said that 911 emergency service providers need to examine their system architecture designs. The hope is that they’ll better understand how and why their systems may fail, and what can be done to keep the agencies operating when they do. In addition, the communication of outages among all the emergency service providers and PSAPs needs to be improved; the April incident highlighted how miscommunications hampered finding the extent and cause of the outage.

Finally, the five FCC Commissioners unanimously agreed that such an outage was “simply unacceptable” and that future “lapses cannot be permitted.” While no one died this time, they note that next time everyone may not be so lucky.

In Other News…

Sarasota Florida Schools Plagued by Computer Problems

Weather Forecasts Affected as National Weather Satellite Goes Dark?

Bad Software Update Hits Aspen Colorado Area Buses

Bank of England Suffers Embarrassing Payments Crash

Google Drive for Work Goes Down

Google Gmail Experiences Global Outage

Cut Fiber Optic Cables Knock-out Air Surveillance in East India for 13 Hours

Bank of America Customers Using Apple Pay Double Charged

iPhone Owners Complain of Troubles with iOS 8.1

UK Bank Nationwide Apologizes Once More for Mobile and Online Outages

Vehicle Owners Seeking Info on Takata Airbag Recall Crash NHSTA Website

West Virginia Delays Next Phase of WVOASIS Due to Testing Issues

UK’s Universal Credit Program Slips At Least Four Years

Heathrow Airport Suffers Yet Another Baggage System Meltdown

LA School District Superintendent’s Resigns in Wake of Continuing MiSiS Woes

We turn our IT Hiccups of the Week attention once again to the Los Angeles Unified School District’s shambolic roll out of its integrated student educational tracking system called My Integrated Student Information Systems (MiSiS). I first wrote about MiSiS a few months ago, and it has proved nothing but trouble to the point that it became a major contributing factor in “encouraging” John Deasy to resign his position last week as superintendent of the second largest school system in the United States. He’d  been on the job three and a half years.

Deasy claimed in interviews after his resignation that the MiSiS debacle “played no role” in his resignation, and instead blamed it on district teachers and their unions opposing his crusading efforts to modernize the LAUSD school system. That is putting a positive spin on the situation to put it mildly.

Why? You may recall from my previous post that LAUSD has been under a 2003 federal district court approved consent decree to implement an automated student tracking system so that disabled and special need students’ educational progress can be assessed and tracked from kindergarten to the end of high school. Headway toward complying with the obligations agreed under the consent decree is assessed by a court-appointed independent monitor who publishes periodic progress reports. Deasy repeatedly failed to deliver on the school district’s promises made to the independent monitor over the course of his tenure.

What really helped seal Deasy’s fate was the latest progress report [pdf] from the independent monitor released last week. The report essentially said that despite numerous “trust me” promises by LAUSD officials (including Deasy), MiSiS was still out of compliance. The officials had promised that MiSiS would be completely operationally tested and ready at the beginning of this school year. But, said the report, the system’s incomplete functionality, the ongoing poor reliability due to inadequate testing, and the misunderstood and pernicious data integrity issues were causing unacceptable educational hardships to way too many LAUSD students—especially to those with special educational needs.

An LA Times story, for one, stated that the monitor found that MiSiS, instead of helping special needs students, made it difficult to place them in their required programs. A survey conducted by the independent monitor of 201 LAUSD schools trying to use MiSiS found that “more than 80% had trouble identifying students with special needs and more than two-thirds had difficulty placing students in the right programs,” the Times article stated.

Deasy’s fate had been hanging by a thread for a while. For instance, at several LAUSD schools—especially at Thomas Jefferson High School in south Los Angeles—hundreds of students were still without correct class schedules nearly two months after the school year had started. 

Another story in the LA Times reported that continuing operational issues with MiSiS meant that some Jefferson students were being “sent to overbooked classrooms or were given the same course multiple times a day. Others were assigned to ‘service’ periods where they did nothing at all. Still others were sent home.”

The problems at Jefferson made Deasy’s insistence that issues with MiSiS were merely a matter of “fine tuning” look disingenuous at best.

The MiSiS fueled difficulties at Jefferson, which extended to several other LAUSD  schools, caused a California Superior Court judge about two weeks ago to intervene and order the state education department to work with LAUSD officials to rectify the situation immediately. In issuing the order, the judge damningly wrote that, “there is no evidence of any organized effort to help those students” at Jefferson by LAUSD senior officials.

As a result of the judge’s order, the LAUSD school board last week quickly approved a $1.1 million plan to try to eliminate the disarray at Jefferson High. Additionally, the school board is now undertaking an audit of other district high schools to see how many other students are being impacted by the MiSiS mess and what additional financial resources may be needed to eliminate it.

Fraying Deasy’s already thin thread further was his admission that MiSiS is in need of some 600 enhancements and bug fixes (up from a reported 150 or so when the system was rolled out in August), which would likely cost millions of dollars on top of the $130 million already spent to address them. Further, he also acknowledged that one of the core functions solemnly promised to the independent monitor would be available for this school year—the proper recording of student grades—could take yet another year to fix all the bugs with it, the LA Times reported.

According to the LA Daily News, LAUSD teachers complain that they not only have a hard time accessing the grade book function, but when they finally do, they find that student grades or even their courses have disappeared from MiSiS. Hundreds if not thousands of student transcripts could be complete shambles, which for seniors applying for colleges is causing major concern. Their parents are also unamused, to say the least.

Probably the last fiber of Deasy’s thread was pulled away last week when it turned out that even if MiSiS had been working properly, a majority of LAUSD schools likely wouldn’t have been able to access all of its functionality anyway. According to a story at Contra Costa Times, LAUSD technology director Ron Chandler informed the district’s school board last week that most of the LAUSD schools’ administrative desktop computers were incapable of completely accessing MiSiS because of known compatibility problems.

A clearly frustrated school board wanted to know why this situation was only being disclosed now; Chandler told the board that the initial plan was for the schools to use the Apple iPads previously purchased by the school board to access MiSiS. But questions over Deasy's role in that $1 billion contract put a hold to that approach. The school board was more than a bit incredulous about that explanation since they had not approve the purchase of iPads with the intent that they were to be used by teachers and school administrators as the primary means to access MiSiS.

Reluctantly, the school board approved $3.6 million in additional funding to purchase 3,340 new desktop computers for 784 LAUSD schools to allow them unfettered access to MiSiS.

While Deasy’s resignation will alleviate some of the immediate political pressure on LAUSD officials caused by MiSiS fiasco, the technical issues will undoubtedly last throughout this academic year and possibly well into the next. However, for many unlucky LAUSD students, the impacts may last for many years beyond that.

In Other News…

Baltimore County Maryland Teachers Tackling Student Tracking System Glitches

Tallahassee’s New Emergency Dispatch System Offline Again

Washington State’s Computer Network Suffers Major Outage

Software Glitch Hits Telecommunications Services of Trinidad and Tobago

New Mexico Utility Company Incorrectly Bills Customers

Software Issue Means Oklahoma Utility Company Overbills Customers

Computer Error Allows Pink Panther Gang Member Early Out of Austrian Jail

Dropbox Bug Wipes Out Some Users’ Files

Generic Medicines Might Have Been Approved on Software Error

Australia’s iiNet Apologizes to Hundreds of Thousands of Customers for Three-day Email Outage

Spreadsheet Error Costs Tibco Investors $100 Million

Duke Energy Falsely Reports 500,000 Customers as Delinquent Bill Payers Since 2010

IT Hiccups of the WeekThere were several IT Hiccups to choose from last week. Among them were: problems with the Los Angeles Unified School District’s fouled up new student information and management system that are so egregious that a judge ordered the district to address them immediately; and the UK Revenue and Customs department’s embarassing admission that its trouble-plagued modernized tax system has again made multiple errors in computing thousands of tax bills. However, the winner of this week’s title as the worst of the worst was an oofta by Duke Energy, the largest electric power company in the U.S. Duke officials apologized in a press release to over 500,000 of the utility’s 800,000-plus current and former customers (including 5,000 non-residential customers) across Indiana, Kentucky, and Ohio for erroneously reporting them as being delinquent in paying their utility bills since 2010.

Duke Energy admitted that the root cause of the problem was a coding error that occurred when customers opted to pay their monthly utility bills via the utility’s Budget Billing or Percentage of Income Payment Plan Plus (in Ohio only).  A company spokesperson told Bloomberg BusinessWeek that while customers were sent the correct invoices and their on-time payments were properly credited, the billing system indicated that the customers’ bills were paid late.

 As a result, that late payment information for residential customers was sent by formal agreement to the National Consumer Telecom & Utilities Exchange (NCTUE). The NCTUE is a consortium of over 70 member companies from the telecommunications, utilities and pay TV industries that serves as a credit data exchange service for its members. Holding over 325 million consumer records, NCTUE provides information to its members regarding the credit risk of their current and potential customers. For non-residential customers, the “late payment” snafu had worse consequences: the delinquency reports were sent to the business credit rating agencies Dun & Bradstreet and Equifax Commercial Services.

Duke Energy’s press release said that the company “deeply regretted” the error that has effectively trashed the credit scores of hundreds of thousands of its residential and business customers for years. The utility says the erroneous information has now been “blocked” for use by the NCTUE, Dun & Bradstreet and Equifax, and it has dropped its membership in all three.

The press release mentioned that the company is still investigating whether additional customers who had “unique” billing circumstances were affected by the coding error.

But what the written statement failed to mention is that the utility found the error only after a former customer discovered that she was having trouble setting up service at another NCTUE utility member because of a supposedly poor payment history at Duke Energy. After contacting Duke Energy and asking why she was being shown as a delinquent bill payer when she was not, the utility realized that the woman’s erroneous credit information was only the tip of a very large IT oofta iceberg.

While Duke Energy claims that “we take responsibility” for the error, it is being rather quiet about explaining what exactly “taking responsibility” means for the hundreds of thousands of customers who may have been unjustly financially affected by the erroneous information sent to the three credit agencies over the past four years. It wouldn’t surprise me to see a class action lawsuit filed against Duke Energy in the near future to help the company gain greater clarity on what its responsibility is.

In Other News…

Judge Orders California to Help LAUSD Fix School Computer Fiasco

UK’s Tax Agency Admits it Can’t Compute Taxes Properly

Tahoe Ski Resort Withdraws Erroneous $1 Season Pass

UK NHS Hospital Patients Offered Harry Potter Names

Florida Utility Insists New Billing System is Right: Empty House Used 614,000 Gallons of Water in 18 Days

Audit Explains How Kansas Botched Its $40 Million DMV Modernization Effort

Indiana BMV Finally Sending Out Overbilling Refund Checks

Nielson Says Software Error Skews Television Viewer Stats for Months

Japan Trader's $617 Billion “Fat Finger” Near-Miss Rattles Tokyo Market

IT Hiccups of the Week

This week’s IT Hiccup of the Week concerns yet another so-called “fat finger” trade embroiling the Tokyo Stock Exchange (TSE). This time it involved an unidentified trader who last week mistakenly placed orders for shares in 42 major Japanese corporations.

According to a story at Bloomberg News, the trader placed over-the-counter (OTC) orders adding up to a total value of 67.78 trillion yen ($617 billion) in companies such as Canon, Honda, Toyota and Sony, among others. The share order for Toyota alone was for 1.96 billion shares—or 57 percent of the car company—amounting to about $116 billion.

Bloomberg reported that its analysis “shows that someone traded 306,700 Toyota shares at 6,399 yen apiece at 9.25 a.m. ... The total value of the transaction was 1.96 billion yen. The false report was for an order of 1.96 billion shares. [The Japan Securities Dealers Association] said the broker accidentally put the value of the transaction in the field intended for the number of shares.”

The $617 billion dollar order, which Bloomberg said was “greater than the size of Sweden’s economy and 16 times the Japanese over-the-counter market’s traded value for the entire month of August,” was quickly canceled before the orders could be completed. Given the out-sized orders and that OTC orders can be canceled anytime during market hours, it is unlikely that the blunder would have gone unfixed for very long, but the fact that it happened resurrected bad memories for the Tokyo Stock Exchange.

Back in 2005, Mizuho Financial Group made a fat finger trade on the TSE that could not be canceled out. A Financial Times of London story states that, “Mizuho Securities mistakenly tried to sell 610,000 shares in recruitment company J-Com at ¥1 apiece instead of one share at ¥610,000. The brokerage house said it had tried, but failed, to cancel the J-Com order four times.” The mistaken $345 million trade cost the president of the TSE along with two other exchange directors their jobs.

Then in 2009, a Japanese trader for UBS ordered $31 billion worth of bonds instead of buying the $310,000 he had intended, the London Telegraph reported.  Luckily, the order was sent after hours, so it was quickly discovered and corrected.

A little disconcerting, however, was a related Bloomberg News story from last week that quoted Larry Tabb, founder of research firm Tabb Group LLC. According to Tabb, despite all the recent efforts by US regulators and the exchanges themselves to keep rogue trades from occurring (e.g., the Knight Capital implosion), fat finger trades still “could absolutely happen here.”

“While we do have circuit breakers and pre-trade checks for items executed on exchange,” Tabb told Bloomberg, “I do not believe that there are any such checks on block trades negotiated bi-laterally and are just displayed to the market.”

Don’t insights like that from a Wall Street insider just give you a warm and fuzzy feeling about the reliability of financial markets?

In Other News…

Computer Glitch Affects 60,000 Would-be Organ Donors in Canada

Korean Air New Reservations System Irritates Customers

Ford Recalls 850,000 Vehicles to Fix Electronics

Mitsubishi i-MiEV Recalled to Fix Software Brake Issue

Doctors’ “Open Payments” Website Still Needs Many More Government Fixes

Apple iOS 8 Hit by Bluetooth Problems

Electronic Health Record System Blamed for Missing Ebola at Dallas Hospital

JP Morgan Chase: Contacts for 76 Million Households and 7 Million Small Businesses Compromised

Banking giant JP Morgan Chase filed an official notice yesterday to the U.S. Securities and Exchange Commission (SEC) updating the material information concerning the cyberattack the bank uncovered during the summer. According to the bank’s Form 8-K, for customers using its Chase.com and JPMorganOnline websites as well as the Chase and J.P. Morgan mobile applications:

Read More

FBI’s Sentinel System Still Not In Total Shape to Surveil

IT Hiccups of the Week

Other than the rather entertaining kerfuffle involving Apple’s new iPhone OS and its initial (non)corrective update (along with the suspicious “bendy phone” accusations), the IT Hiccups front was rather quiet this past week. Luckily, an “old friend” came by to rescue us from writing a post on some rather mundane IT snarl, snag or snafu.

Just in the nick of time, the U.S. Department of Justice's Inspector General released his latest in an ongoing series of reports [pdf] about Sentinel, the FBI’s electronic information and case management system. In this report, the IG focused on how Sentinel users felt about working with the system. Sadly yet unsurprisingly, the IG found that Sentinel is still suffering from some serious operational deficiencies two years after it went live.

Read More

Home Depot: Everything is Secure Now, Except Maybe in Canada

This past Thursday, after weeks of speculation, Home Depot, which calls itself the world’s largest home improvement retailer, finally announced [pdf] the total damage from a breach of its payment system: At its 1,157 stores in the U.S. and Canada, 56 million unique credit and debit cards were compromised. This is said to be among the three largest IT security breaches of a retail store, and ranks with some of the largest security breaches of all time.

According to Home Depot’s press release, the company confirmed that the criminal cyber intrusion began in April and ran into September, and “used unique, custom-built malware to evade detection. The malware had not been seen previously in other attacks, according to Home Depot’s security partners.”

The company says that it has now removed all the malware that infected its payment terminals, and that it has “has rolled out enhanced encryption of payment data to all U.S. stores.” The enhanced encryption approach, Home Depot states, “takes raw payment card information and scrambles it to make it unreadable and virtually useless to hackers.” It is a bit curious that the company says “virtually useless” and not “completely useless,” though.

Canadian stores, on the other hand, will have to wait a bit longer. While Home Depot’s Canadian stores have point-of-sale EMV chip and PIN card terminals, “the rollout of enhanced encryption to Canadian stores will be completed by early 2015,” the company says. Canadian Home Depot stores were at first thought to be less vulnerable because of the chip-and-pin terminals being in place, but that apparently hasn't been the case. For some reason, the company is refusing to disclose the number of Canadian payment cards compromised, the Globe and Mail says. The Globe and Mail estimates the total number of cards compromised to be around 4 million.

Home Depot goes on to say in its press release that it has no evidence “that debit PIN numbers were compromised or that the breach has impacted stores in Mexico or customers who shopped online at HomeDepot.com or HomeDepot.ca.”

As usual in these situations, Home Depot “is offering free identity protection services [for one year], including credit monitoring, to any customer who used a payment card at a Home Depot store in 2014, from April on. The company also apologized to its customers “for the inconvenience and anxiety this has caused.”

Home Depot’s data breach was first made public on 2 September by Brian Krebs, the former longtime Washington Post reporter with amazing IT security contacts, who now publishes a must-read security website called Krebs on Security. Several banking sources told Krebs that “a massive new batch of stolen credit and debit cards that went on sale [that] morning in the cybercrime underground,” with Home Depot looking like the source. Krebs went on to write that:

There are signs that the perpetrators of this apparent breach may be the same group of Russian and Ukrainian hackers responsible for the data breaches at Target, Sally Beauty and P.F. Chang’s, among others. The banks contacted by this reporter all purchased their customers’ cards from the same underground store — rescator[dot]cc — which on Sept. 2 moved two massive new batches of stolen cards onto the market.

In fact, it wasn’t until 8 September that Home Depot confirmed that it had in fact suffered a breach. Krebs, who has since written about the breach several times, recently wrote that the breach may not be as severe as indicated (nor as severe as it could have been). Sources have indicated that the malware used — which looks like a variant of what smacked Target late last year — was “installed mainly on payment systems in the self-checkout lanes at retail stores.” The reasoning is that if the malware had penetrated Home Depot’s payment system to the extent that Target’s systems were breached, many more than 56 million payment cards would have been compromised.

Sellers of compromised Home Depot card data are targeting specific states and ZIP codes in the hopes that buyers of the stolen cards will raise fewer red flags in the credit card and banking fraud algorithms. For instance, some 52,000 for Maine Home Depot stores, 282,000 for stores in Wisconsin, and 12,000 for those stores in Minnesota have been offered for sale. Card prices seem to be ranging mostly from $9 to $52 apiece, although for $8.16 million, one could purchase all of the stolen payment card numbers from Wisconsin, the Milwaukee-Wisconsin Journal Sentinel reported. The Journal Sentinel noted that its investigation found that:

Prices start at $2.26 for a Visa debit card with an expiration date of September 2014. The most valuable cards are MasterCard platinum debit cards and business credit cards. The most expensive card compromised in Wisconsin, a MasterCard valid through December 2015, was advertised at $127.50.

Interestingly, while Home Depot’s 56 million payment card breach is larger than Target’s 40 million payment card breach, the severity of the blowback so far is much more muted on the part of customers and investors. Part of the reason seems to be that the discovery of the breach happened at the end of summer, a slow shopping time for Home Depot, while Target’s was announced during the prime holiday buying period, which spooked its customers.

Further, investors have figured that Target’s breach cost the company some $150 million, excluding the $90 million in insurance reimbursements—a sum the company could ill afford given its ongoing retail difficulties. A similar sum may dent Home Depot’s bottom line, but the company is better placed financially to absorb the damage. The company stated in its press release that it has spent at least $62 million in dealing with the breach so far, with some $27 million of it covered by insurance. Home Depot says it doesn’t know how much more it will need to spend, but I suspect it could be an additional couple of hundred million dollars before all is said and done.

A third reason for the muted response may be that customers are now becoming inured in the wake of so many point-of-sales data breaches. For example, last May, the Ponemon Institute was cited in a CBS News report as stating that some 47 percent of adult Americans have had their personal information compromised in the past year. Given the Home Depot breach, as well as many others since, the number is probably even higher now. How many people had their personal information compromised multiple times is unknown, but I suspect it isn’t an insignificant number.

Home Depot’s financial and reputational pain might increase significantly, however, if the joint Connecticut, Illinois, and Californian state attorneys general investigation into the breach decides there is sufficient cause to sue Home Depot. As expected, at least one class action lawsuit each has been filed in both the United States and Canada, and more can be expected. Banks may also decide to sue Home Depot to cover the cost of any credit or debit cards they have to replace and for other financial damages, like some did against Target and earlier against TJX.

As reported by both The New York Times and Bloomberg’s BusinessWeek, Home Depot was repeatedly warned by its own IT security personnel about its poor and outdated IT security since 2008. Corporate management reportedly decided not to increase immediately the company’s security capabilities using readily available systems even in the aftermath of the Target breach and a couple of Home Depot stores being hacked last year, incidents that were not publicly disclosed until now. While the company did eventually decide to upgrade its payment security systems, the implementation effort didn’t get started until April, the same month as the breach. In addition, the papers report, Home Depot seemed to have weak security monitoring of its payment system, even though company management knew it was highly vulnerable to attack.

That Home Depot’s payment system was left vulnerable is interesting, because the company spent hundreds of millions of dollars improving its IT infrastructure over the past decade. Perhaps with revenues of $79 billion in 2013 the company felt it could easily afford the costs of an attack, and therefore, there was no urgent rush to increase its security posture. Brian Krebs notes this apparent lack of urgency as well. He says that even though the company was alerted to something being massively amiss by banks,  “thieves were stealing card data from Home Depot’s cash registers up until Sept. 7, 2014, a full five days after news of the breach first broke.

That alone speaks of an arrogance that belies Home Depot's public statements about how it takes the privacy and security of its customers’ personal information “very seriously.” Local Home Depot store personnel I have spoken with seem very ill-informed concerning the breach and what customers should do about it, which also seems to me a sign of a less than Home Depot’s advertised customer-caring attitude.

Home Depot’s seemingly cavalier IT security attitude isn’t unique, of course. Target didn’t bother to investigate alerts from its advanced warning system showing that it was being hacked until it was JTL — just too late. Just last week, eBay was being slammed again for its “lackadaisical attitude” toward IT security after multiple instances of malicious cross-site scripting that have been unabated since February were found on its UK website. Only after the BBC started asking eBay questions about the scripting issue did it decide that perhaps it should take them seriously. You may remember, it was only last March when eBay, which also proclaims to take customer security “very seriously,” asked all of its users to change their passwords after a cyberattack compromised its database of 233 million usernames, contact information, dates of birth, and encrypted passwords.

To tell you the truth, every time I read or hear a company or government agency claim in a press release that, “We take your security seriously,” in the wake of some security breach, I shake my head in disbelief.  Why not just state honestly, “We promised to take your security seriously and we obviously failed to take it seriously enough. We’re sorry and we will be better prepared from now on.” Alas, that level of candor is probably much too much to ask.

Indiana’s Bureau of Motor Vehicles Overcharged 180,000 Customers for 10 years

IT Hiccups of the Week

Put aside, for a moment, the record theft of credit card accounts from Home Depot. I'll tell you all about that in a later post. Instead let me pick another interesting IT Hiccup from last week's hodgepodge of IT problems, snarls, and screw-ups: The Indiana’s Bureau of Motor Vehicles (BMV) plans to refund some US $29 million plus interest to 180,000 customers for charging them an incorrectly calculated excise tax when they registered their vehicles. The BMV claimed the problem began during the initial changeover in 2004 to its then new $32 million System Tracking and Record Support (STARS) computer system.

Read More
Advertisement

Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Contributor
Willie D. Jones
 
Load More