Risk Factor iconRisk Factor

Report Claims Spy Plane to Blame for Air-Traffic Outage on U.S. West Coast

IT Hiccups of the Week

Unlike the past month, last week saw an overflowing cornucopia of IT-related malfunctions, errors and complications. We start off this week’s edition of IT Hiccups with a strange story that has followed in the wake of a major air traffic control outage in the U.S.—one of several incidents that aggravated air travelers around the world.

The story starts off simply enough: On Wednesday afternoon, a little after 3:00 p.m. Pacific time, a computer problem occurred with the En Route Automation Modernization (ERAM) system at the Los Angeles Air Route Control Center. The snafu, a USA Today article reported, left, “controllers temporarily unable to track planes across Southern California and parts of Nevada, Arizona and Utah.” The FAA issued a ground stop on planes wanting to fly to Los Angeles for about an hour until the problem could be cleared up. However, that action caused the cancellation of some 50 flights arriving and departing Los Angeles International Airport and delayed another 455 flights across the country.

ERAM is part of a $2.2 billion Federal Aviation Administration (FAA) modernization effort, which as I have written about previously, has had its problems. So, while the computer problem was a major annoyance,  nothing there seemed out of the ordinary. The FAA issued its usual bland statement indicating that it was investigating the issue: “The FAA will fully analyze the event to resolve any underlying issues that contributed to the incident and prevent a recurrence.”

A spokesperson for the Professional Aviation Safety Specialists, the union that represents many FAA employees, hinted at the source of problem by telling USA Today that, “There was so much information coming into the system that it overloaded.”

That seemed to be the end of the incident—well, that is until the weekend, when NBC News ran a story online citing “sources familiar with the incident” who claimed that a U-2 spy plane flying in the area triggered the problems with the ERAM computers. (The article's title said the U-2 had "fried" the system.) The NBC story reported that:

“The computers at the L.A. Center are programmed to keep commercial airliners and other aircraft from colliding with each other. The U-2 was flying at 60,000 feet, but the computers were attempting to keep it from colliding with planes that were actually miles beneath it.”

“Though the exact technical causes are not known, the spy plane’s altitude and route apparently overloaded a computer system called ERAM, which generates display data for air-traffic controllers. Back-up computer systems also failed.”

NBC News contacted the FAA, which basically reissued its previous statement, but also would neither confirm nor deny that the U-2 was the cause of the outage. The U.S. Air Force also declined to comment directly on the story's details, and the Pentagon was not responsive to an inquiry from Reuters. The Wall Street Journal ran a story about the incident today, again, with government officials all deciding to remain mum.

When I saw the U-2 story at the NBC News website, I was more than a bit skeptical, especially over the "frying" statement. It is hard to believe that this is the first time military aircraft have passed through the LA ERAM control space at altitudes above normal commercial airline altitudes. On the other hand, I wouldn’t totally discount that there couldn’t have been some unique set of circumstances involving a military aircraft that just happened to be a U-2 triggering an unknown problem within the ERAM software. Still, my instinct is to attribute the problem to a more prosaic explanation.  If some official explanation comes out, I will update the post. However, feel free to speculate on what happened in the meantime.

Update 6 May 2014 : U-2 did indeed cause the LA-area ATC problem

The FAA admitted late yesterday that a U-2 did, in fact, trigger the problems last week with the LA-area ERAM system. Piecing together the FAA explanation from various news sources, e.g., NBC News, Reuters and CNN (no one seems to have published the FAA statement yet in its entirety):

“On April 30, 2014, an FAA air traffic system that processes flight plan information experienced problems while processing a flight plan filed for a U-2 aircraft that operates at very high altitudes under visual flight rules.”

“The computer system interpreted the flight as a more typical low altitude operation, and began processing it for a route below 10,000 feet.”

“The extensive number of routings that would have been required to de-conflict the aircraft with lower-altitude flights used a large amount of available memory and interrupted the computer’s other flight-processing functions.”

“The FAA resolved the issue within an hour, and then immediately adjusted the system to now require specific altitude information for each flight plan.”

“The FAA is confident these steps will prevent a reoccurrence of this specific problem and other potential similar issues going forward.”

The CNN story, which provides (so far) more detail than anyone else, indicates that the ERAM system was overtaxed because of the many waypoints the U-2 flight plan had filed. In addition, however, CNN reports that, “Simultaneously, there was an outage of the Federal Telecommunications Infrastructure, a primary conduit of information among FAA facilities.”  CNN did not indicate what caused the FTI to go out, however.

Altogether, the complex U-2 flight plan and the FTI outage added up to what one government official said on background was a “perfect storm” that took down the ERAM system. Well, sometimes truth is stranger than fiction.

U.K. Airline Travelers Unhappy over Passport Control Failure

Los Angeles-bound passengers weren’t the only ones unhappy last Wednesday. International airports and ports across the U.K. experienced unbelievably long queues at passport control as the U.K. Border Force computers went down at around 2:30 p.m. London time and the outage lasted for some 12 hours. Some passengers at Gatwick reported waiting in line for over four hours, and fights were said to have broken out among passengers waiting in line at both Gatwick and Luton Airports. While non-EU passengers waited the longest, UK citizens also had to wait for up to two hours as well, newspapers reported.

The Home Office put out a statement that read in part, “We apologise for the delays that some passengers experienced at passport controls yesterday, but security must remain our priority at all times.” The Home Office also said that its technical staff has been asked “to look into the incident to ensure lessons are learnt.”

However, whether those lessons to be learnt will ever be publicly disclosed is still to be determined. The Home Office is refusing to disclose what caused the massive computer meltdown, which seems to be its standard operating policy.

Completing the trifecta of airport problems that occurred last Wednesday was a report that a construction crew cut a fiber optic cable at Florida’s Fort Lauderdale-Hollywood International Airport at about 4:00 p.m. Eastern time. The loss of the cable resulted in in dozens of flights being canceled, delayed, or rerouted. No doubt passengers trying to travel to the U.S. West Coast from the airport thought they were snake-bit.

More General Motor Recalls for Software Issues

General Motors announced two more vehicle recalls to correct software issues in its vehicles. First, it is recalling some 56 400 Cadillac SRX crossover vehicles from the 2013 model year because of a software problem in the SRX transmission control module. According to the National Highway Traffic Safety Administration, “In certain driving situations, there may be a three to four second lag in acceleration due to the transmission control module programming.” No crashes have been attributed to the issue, GM reports.

In addition, General Motors said it is issuing recalls for 51 640 Buick Enclave, Chevrolet Traverse and GMC Acadia SUVs, from the 2014 model year that were built between 26 March and 15 August 2013. In these vehicles, GM says, the engine control software may cause the fuel gauge to read inaccurately the amount of fuel remaining, leading the vehicle unexpectedly running out of fuel. GM says it doesn’t know of any crashes that are linked to this problem, either.

In Other News of Interest….

New York City Accidentally Sends $298 475 644 in Duplicate Pension Payments to Retired Police and Fireman

U.S. Selective Service Sends Erroneous Military Draft Registration Letters to Marylanders

New Mexico’s Albuquerque Water Utility Authority Admits $9 Million Accounting Error

Washington State’s 911 Calls Routed to Colorado in Recent Outage

Virgin Mobile Australia Outage Hits 350 000 Customers

Virginia Tunnel E-Z Pass Toll Problems Continue Unabated

What’s the Wait for the Next London Tube Train? How About 939 Minutes?

Taipei Metro Suffers Fresh Problems

Some Indiana Schools Opt to Avoid Online ISTEP Tests

Michigan Hospital Still Struggling with New EHR System Installed in March

Is Australian Splendour Concert Ticketing Problem Glitch, Hack or Prank?

Computer Error Forces NY Stock Exchange to Cancel 20 000 Trades

Northern California Kaiser Permanente Says Insurance Billing Error to be Corrected Soon

Indiana’s Lebanon Utilities Trying to Correct Delayed Billing

Computer Issue Delays Paychecks for Flint Michigan Schools

Nova Scotia Power Overcharges Thousands of Customers

Computer Issue Causes Bank of Tokyo-Mitsubishi UFJ to Delay 23,000 Scheduled Wire Transfers

U.K.’s Norfolk County Council Staff Email Out for Over a Week

Computer Problems Delay Visitor Entry to Kentucky Derby

Erroneous Delinquent Tax Notices Sent to Michigan Residents

Kansas Farm Machinery Manufacturer Forced Shut by Computer Problems 

UK Farmers Still Struggling with Single Payment Scheme Online System

National Australia Bank Forced to Make More Compensation in Wake of 2012 Computer Problems

British Columbia’s Pharmanet Computer System Operating Once More

Online Testing Problems Strike Again

IT Hiccups of the Week

Last week was another relatively quiet week, with only a smattering of IT-related errors, malfunctions or problems being reported. But those that were gave one a definite sense of déjà vu.  Yet again, Oklahoma, Florida, and Indiana reported problems last week with their end-of-school-year online standardized testing.

On Monday, the AP reported that Oklahoma state education officials had to suspend online testing across the state for middle school and high school students; the tests either responded very slowly or quit working altogether. This comes after a similar occurrence last year, and promises from the testing vendor, CTB/McGraw-Hill, that the company would take steps to ensure that it wouldn’t happen again. CBT/McGraw-Hill apologized for what it called a “network service interruption” it said lasted only three hours.

Testing resumed on Tuesday and proceeded without further incident for the remainder of the week.

Last year's testing problems forced CBT/McGraw-Hill to forgo $1.2 million in order to settle damage claims filed by Oklahoma. However, this latest incident is likely to cost CBT/McGraw-Hill its $13.5 million per year online testing contract. State Superintendent of Schools Janet Barresi, who came under fire for last year’s testing fiasco and staked her reputation on ensuring that this year’s testing would be carried off without a hitch, said she would be recommending that Oklahoma not renew CBT/McGraw-Hill’s contract. If any other testing problems erupt this year, expect calls for Barresi’s contract not to be renewed as well.

Then on Tuesday, connectivity problems derailed online testing in many of Florida’s 67 school districts trying to administer the Florida Comprehensive Assessment Test (FCAT). The problems led state education officials to suspend online testing across Florida. Pearson, the company which holds Florida’s 5-year, $254 million online testing contract, apologized for the disruptions, which it blamed on its hosting service sub-contractor.

Florida Gov. Rick Scott called the testing problems “unacceptable” and said the state would “pursue all liquidated damages and other remedies that may be available as a result of Pearson’s failure to fulfill its duty under the contract with the department.” This is the second time in four years that there have been problems with the FCAT. In 2010, Florida fined Pearson $14 million for delivering the FCAT results late.  

FCAT testing resumed across the state on Wednesday, but some schools in the Miami-Dade area reported they were still having problems. Those, however, were attributed to a Microsoft security update issue.  No testing concerns were reported in Florida for the remainder of the week. 

Indiana, which, like Oklahoma, contracts with CTB/McGraw-Hill for its online standardized testing, also experienced massive problems last year. And it is reportedly very nervous about the start of its ISTEP tests this week. CTB/McGraw-Hill says that it is “confident about everything we have control over” and that it “stands behind all the measures” it took to ensure that nothing happens this year.  However, Indiana school officials are not as confident as CTB/McGraw-Hill given that some Indiana school districts were reporting trouble during their online ISTEP practice tests.

The four-year, $95 million ISTEP testing contract between the state and CTB/McGraw-Hill ends in four months, and the two still have not reached a final settlement over last year’s testing debacle.  

Ulster Bank Has another IT System Issue

In another case of déjà vu, Ulster Bank apologized to its ATM customers for being debited twice on each transaction over a 24-hour period spanning Monday and Tuesday of last week, the Belfast Telegraph reported. Ulster Bank has suffered numerous banking issues over the past few years, such as an outage in 2012 that lasted for weeks. The bank is still trying to repair its reputation over that failure.

The bank promises to refund all of its customers for the ATM error and says no customer will lose money as a result.

Oregon Surrenders and Calls in the Feds to Provide Healthcare Insurance Portal

Oregon announced as expected that after spending over $248 million, it is going to close down its Cover Oregon healthcare insurance website and use the federal Affordable Care Act website for the next enrollment period that starts in November. The cost of trying to fix Cover Oregon was not fiscally prudent, since the move to the federal website was estimated to cost $5 million while the estimated cost to repair the site was some $78 million—a cost that the federal government probably would not underwrite.

Also, as expected, Oregon state health officials refused to call the Cover Oregon debacle a waste of money. They did concede, however, that the IT failure was a “disappointment.” State officials still insist that Oracle was mostly, if not entirely at fault, for the failure, which Oracle vigorously denies—which reinforces the old adage that success has many parents, but failure is an orphan.

In Other News…

Homeland Security Experiences Immigration System Computer Meltdown

Some Galaxy S5s Shipped with Non-working Camera

New Zealand Exchange Experiences Trading Issue

Georgia Ports' Garden City Terminal Traffic Glitch Reappears

Vodafone Ghana Users Unhappy Over Outage

Astronauts Fix Faulty ISS Computer

Apple Offers to Replace iPhone 5 Faulty On-Off Button

Software Issue with Bluefin AUV Searching for Missing Malaysian Aircraft Search Fixed

Oracle Assails State’s “False Narrative” Explaining Cover Oregon Debacle

IT Hiccups of the Week

Last week was a very, very quiet week with only a few IT-related errors, malfunctions or problems being reported. So, for this week’s edition of IT Hiccups, we decided to revisit the ongoing and increasingly nasty public dispute surrounding Oregon’s health insurance exchange. Things took an interesting turn last week, courtesy of Oracle’s irate letter telling state government officials to quit lying about the company's role in the debacle.

As I wrote recently, Oregon’s attempt to implement its own health insurance exchange, called Cover Oregon, by 1 October 2013 to support the requirements of the Affordable Care Act (ACA) has not been exactly stellar. The state, after spending $200 million, is still trying to decide whether it will fix its website (no member of the public has ever directly enrolled for state health insurance coverage using it), use another state’s exchange software (like Maryland has decided to do), or default to using the Federal exchange. As I noted last week, news reports state that the number of “most serious programming bugs” discovered in Oregon's healthcare website implementation has reportedly grown from 13 in January to over 300 currently. State officials are supposed to decide any day now which alternative is most appealing.

Oregon state officials have not been shy about blaming its major software supplier, Oracle, for its problems. As a story at MSN last November documented, in a state hearing into the causes behind the Cover Oregon’s woes, Cover Oregon board member Ken Allen said:

“This is their [Oracle’s] failure. . . Their dates have shifted and shifted and shifted. . . The Cover Oregon staff are tremendously dedicated folks who have worked really hard. . . All of that good will and support from the business community is being frittered away because Oracle didn’t get it online. It’s 98% Oracle’s screw up.”

Democratic Oregon Senator Jeff Merkley also pointed the finger in November at Oracle on NBC’s Nightly News television program, saying that:

“Oracle was contracted to write the exchange. They promised it would be fully delivered on time, it would be beautiful and do more than any other exchange in the country, and it's in complete dysfunction.”

Merkley has continued placing the blame on Oracle in subsequent public appearances, as did Cover Oregon’s previous CIO Aaron Karjala, and Governor John Kitzhaber, who also claimed he was in the dark about how bad the problems really were. Kitzhaber did concede that the state may have been a bit at fault, but only in terms of having had an “unrealistically high sense of optimism that Oracle could deliver.”

Early in March, Oregon and Oracle reached an agreement to basically end the contract. The new deal calls for the state to pay Oracle $43.9 million of the $69.5 million the company claimed it was owed for its Cover Oregon work from November last year through February of 2014. Oracle had already been paid some $90 million for its previous project efforts. According to the agreement, at the end of sixty days, both sides could seek legal recourse to recover monies each thought due it.

The rapidly approaching end of the sixty-day period and the recent appointment of a new Cover Oregon executive director, Clyde Hamstreet, may well explain why Oracle President and Chief Financial Officer Safra Catz decided to end the company's silence about what it believed went wrong. Last week, Catz sent what can only be termed a provocative letter (pdf) to Hamstreet demanding the state to quit spreading a “false narrative” that seeks to blame the company for everything wrong with Cover Oregon.

Catz says in her letter that “contrary” to what Oregon officials have been telling the press, Oracle was never the lead in charge of the Oregon Health Exchange effort. In fact, Oregon “by choice and by contract, were squarely in charge of the project,” she writes. And in spite of repeatedly being told by outside experts to hire a system integrator and promises that it would do so, Catz said, Oregon declined to hire one.

Oracle’s only role was to “assist” the state with various tasks, Catz states. In addition, Oracle “provided clear and repeated warnings” for months that the effort was in trouble. A person “with even minimal IT expertise would have known that the system would not, and could not, go live on October 1.” Furthermore, some “critical specifications” were not given to Oracle until November 2013.

Catz goes on to say that the Cover Oregon system is actually working and has been for several weeks, despite the state saying it is not. System availability now usually is “exceeding 99%” and “the current error rate has dropped to 0.7%.” Catz says individuals could indeed sign up using the Cover Oregon website, but for some unexplained reason, the state refuses to let that happen.

Catz finishes her letter with a parting shot telling the state to do itself a favor and hire a systems integrator, as it promised it would do “at the beginning of the project.”

Cover Oregon’s Hamstreet wrote back (pdf) to Catz saying he was still getting up to speed, and couldn’t yet comment directly on the “factual assertions” contained in her letter.  However, he was forwarding her letter to Cover Oregon’s lawyers.

Hamstreet apparently has a great deal of experience in troubled projects, which might be one reason Catz decided to write to him as soon as he was hired. He probably knows that rarely is one party fully to blame for an IT debacle. In addition, Catz probably wanted to try to encourage Hamstreet to look deeper into the history of Cover Oregon, as well as put the state on warning that if a lawsuit ensues, Oracle will be looking to embarrass Cover Oregon and other state officials as much as possible.

In Oracle’s favor is that even a simple Internet search immediately turns up warnings of deep trouble (pdf) with the project dating back to 2012. Also, the Centers for Medicare & Medicaid Services released an independent report (pdf) about Cover Oregon in February of this year that described Oregon’s management of the effort, in not so many words, as abysmally amateurish.

This is not to absolve Oracle in any way, however. The company didn’t seem to have any problems taking the state’s money and running, as it were. If things were really so bad, and Oregon reneged on its promise to hire a systems integrator as Catz claimed, why didn’t Oracle immediately ask to be let out of the contract? In addition, for Catz to claim that Oracle was there to merely “assist” the state is clearly disingenuous. To say that Catz’s letter shows more than a little hubris and hypocrisy is putting it mildly.

It will be interesting to see whether Oregon or Oracle decide to take legal action against the other in the next few weeks. Although it would be fun to watch, my guess is no, since it looks like a real lose-lose situation. Oracle risks more damage to its reputation, as does Gov. Kitzhaber, who is running for reelection.

Amid this contretemps is even more drama: the former CIO for the Oregon Health Authority is contemplating suing Oregon for wrongful discharge and defamation—among other charges—based on what she claims is retribution for refusing to go along with a “cover-up” of the problems with Cover Oregon. Other state officials are probably not anxious to get hauled into court to testify about what they knew and when they knew it.  

So after a bit more public sword play among all the parties to leave a perception of really caring about wasting taxpayer monies, I expect this debacle will eventually be buried and forgotten, like so many other government IT projects before it.

In Other News…

Virgin Media Says Sorry for All the E-mails

Virginia E-Z Pass Suffers More Tunnel Software Problems

German Shepard Gets Jury Duty Call

GLONASS Systems Has New Problem

Australia's Commonwealth Bank Systems Go on Fritz Once More

Manila’s Ninoy Aquino International Airport Immigration Computer System Goes Down

Pennsylvanian Surprised by $43 000 Water Bill

Heartbleed Bug Bit Before Patches Were Put in Place

It’s been a little less than a month since the Heartbleed bug and was discovered and less than two weeks since the public was informed about it. The bug is a “trivial” programming error made in early 2012 and discovered by Google in March that non-trivially affects the OpenSSL (secure socket layer) cryptographic software library.

As described at Google’s Codenomicon's Heartbleed.com website, the error “allows anyone on the Internet to read the memory of the systems protected by the vulnerable versions of the OpenSSL software. This compromises the secret keys used to identify the service providers and to encrypt the traffic, the names and passwords of the users and the actual content. This allows attackers to eavesdrop on communications, steal data directly from the services and users and to impersonate services and users.” What’s more, an attacker exploiting a system that hasn’t fixed the error doesn’t leave an overt trace of their activity.

While last week saw a lot of speculation about the ultimate severity of Heartbleed, this week some of the consequences were starting to be felt. First, on Sunday, ComputerWorld reported that Akamai Technologies, whose network handles 30 percent Internet traffic, announced that a researcher had found a “bug” in its Heartbleed patch. As a result, ComputerWorld stated, “Akamai is now reissuing all SSL (Secure Sockets Layer) certificates and security keys used to create encrypted connections between its customer's websites and visitors to those sites.” The article notes that Akamai runs 147 000 servers in 92 countries.

Then, the Canadian Revenue Agency (CRA) on Monday announced that the Social Insurance Numbers of some 900 people had been compromised over a six-hour period before the agency's systems could be taken offline and patched. The CRA has delayed the tax filing deadline from 30 April to 5 May because of the bug. On Wednesday, the Royal Canadian Mounted Police—which was investigating the intrusion and had convinced the CRA to delay its announcement of being hacked in order to help with its inquiry—announced an arrest of a 19-year old Ontario computer science student in connection with the theft.

British parenting website Mumsnet also reported on Monday that it had been hacked with possibly the records of all of its 1.5 million user accounts had been compromised last Friday before a fix could be applied, the Daily Mail reported. Mumsnet founder Justine Roberts said she only “realized the extent of the problem when her own account was attacked,” and urged users to change their passwords.

The Sydney Morning Herald then reported on Tuesday that GE Money Australia was warning customers of the financial websites it operated, “including the Myer Visa Card and Myer Card portals, as well as Coles Mastercard” along with a “number of other GE partner websites, including 28degrees Mastercard” that they were vulnerable to the Heartbleed bug. GE Money was recommending that those customers change their passwords. However, GE Capital, the parent company of GE Money, tried to tamp down customer worries by saying that it had “no reason to believe any customer data has been compromised.

Also on Tuesday, the Guardian newspaper reported that some 50 million devices if not tens of millions more running Android 4.1.1 might be vulnerable because of the Heartbleed bug. Those running Android version 4.1.2 are not vulnerable, the article stated. Google says that, “We have also already pushed a fix to manufacturers and operators,” but it is unclear how many of the devices will actually end up having the fix installed. An interesting article in MIT Technology Review discusses in more depth the likely long-term legacy of Heartbleed because of the sheer number and types of devices that may never receive a bug fix.

Last week, American Banker reported that the U.S. Federal Financial Institutions Examination Council had issued a warning to U.S. financial institutions to bolster their security in light of the Heartbleed bug, including asking them to “strongly consider” recommending that their customers change their passwords. However, American Banker noted, many major U.S.  banks, including Bank of America, Capital One Financial, JPMorgan Chase, Citigroup, TD Bank, U.S. Bancorp, Wells Fargo and PNC Financial Services Group, have publicly stated that they were not affected by the bug. That said, on Wednesday, American Funds, the third-largest mutual fund family, recommended that its 825 000 customers change their passwords, Reuters reported, because there had been “a very narrow window of risk.”

Also on Wednesday, ArsTechnica reported that security researchers announced that OpenSSL-powered VPN networks could be compromised. Last week, researchers were not sure whether the threat was real or only theoretical in nature: now they know. These networks are now being urged to be fixed as soon as possible.

Yesterday, the New York Times did say there was a bit of good news. Security researchers at the Berkeley National Laboratory and the National Energy Research Scientific Computing Center, the Times stated, “have been examining Internet traffic they recorded going in and out of their networks since the end of January, looking for responses that would indicate a possible Heartbleed attack.” So far, they have not been able to find any. This doesn’t mean that there weren’t any such attacks before January, however.

The findings do lend just a bit of support to the NSA’s claim that it didn’t exploit the bug, or that it didn’t know about until its public disclosure. Bloomberg News stirred up a hornet’s nest of outrage when it reported last week that not only did the NSA have knowledge about the bug, but has been exploiting it since it was accidentally created in 2012. Bloomberg based its story on two anonymous sources who claim to be “familiar with the matter.”

The Bloomberg story raised the interesting issue of how and when to disclose such a major security problem. Apparently, once the programming error was discovered by Google, neither that information nor the fix was shared with the U.S. or other governments, nor with a whole host of vulnerable organizations, before the Google made its public announcement or fixed its own systems. Now Google is being accused of “being selfish, putting its corporate interests before global internet users' security, playing favourites, and waiting too long to report the serious Heartbleed security bug to the open-source project whose software contained the critical error.” Expect this issue of when and how to make a bug disclosure of this magnitude to be hotly debated into the foreseeable future.

The New York Times article also reported that University of Michigan computer scientists have been monitoring their Internet honeypots of fake data since the disclosure of the Heartbleed bug to see whether intruders would try to use it to access them. So far, “they’ve witnessed 41 unique groups scanning for and trying to exploit the Heartbleed bug on three honeypots they are maintaining. Of the 41, the majority of those groups—59 percent—were in China.”

While the damage reported so far doesn’t look severe, over the next few weeks, months, and possibly years, there will no doubt be more announcements of Heartbleed bug vulnerabilities and related intrusions. As security company Symantec notes, while the focus has been on vulnerable websites, the bug “equally affects client software such as Web clients, email clients, chat clients, FTP clients, mobile applications, VPN clients and software updaters, to name a few. In short, any client that communicates over SSL/TLS using the vulnerable version of OpenSSL is open to attacks.” 

Even trying to determine whether the website you are visiting, let alone a connected device you are knowingly or unknowingly using, is Heartbleed bug-free, is not the easiest thing in the world to accomplish. According to a story at the Guardian, 95% of the most popular detection tools to determine whether a web services they are using or hosting has the flaw are not reliable. However, if the websites you use have indicated they are patched, as the IEEE announced earlier this week, it would be a good idea to change your password now.

One useful thing that the Heartbleed bug has done is to expose just how potentially fragile Internet security really is and how much of its security depends on the kindness of a group of 11 volunteers who work on the OpenSSL Project. While the publicity has sparked debate whether this is ideal or needs to be revisited, exactly how to improve the situation will likely remain open to discussion for quite some time I suspect.

[Update: The domain Heartbleed.com is owned by Codenomicon, not Google, as originally stated.]

911 Outages Hit Washington State, Parts of Oregon

IT Hiccups of the Week

Last week, with the exception of the massively annoying Heartbleed programming oofta, was a relatively quiet one with the typical garden variety of IT-related snafus, errors and problems being reported. We start this edition of IT Hiccups with 911 emergency call system outages that hit several states last week.

Early last Thursday morning beginning around 0100 am local time, three counties in Oregon as well as the entirety of Washington State discovered that their emergency 911 call dispatch systems were no longer working. Oregon’s outage was cleared up in about three hours, but it took nearly seven hours before 911 services were restored across Washington State.

Read More

Heartbleed Bug Patch Underway, But Was It Really the Problem?

More clarity about the vulnerability of banking and credit card data and other sensitive information such as website logins and passwords came this week, when a Google researcher and a team from the Finnish security firm Codenomicon separately reported the existence of an Internet security flaw that is being called the Heartbleed Bug. What makes Heartbleed so insidious is the fact that it can allow hackers to snatch data from a server’s memory 64 kilobytes at a time—even if the information is supposedly encrypted—without leaving a trace. While the end user takes comfort in the ability of SSL/TLS encryption to keep his or her data from prying eyes, the “https” in the URL and the closed padlock icon are a cruel trick.

Since the news broke, websites have responded, updating their versions of OpenSSL, one of the most commonly used variants of SSL/TLS. These protocols were designed to implement asymmetric cryptography, wherein a unique private key is generated for each communication session in order to encrypt and authenticate the messages exchanged between the parties. But the flaw made data such as these keys and the information they were intended to keep secret far too easy to access. What’s more, the operators of the websites were completely unaware that this was possible. 

How big a problem is this? Security expert Bruce Schneier, on his eponymous blog, says that, “…anything in memory—SSL private keys, user keys, anything—is vulnerable. And you have to assume that it is all compromised. All of it.” Not to put too fine a point on it, he added, “‘Catastrophic’ is the right word. On the scale of 1 to 10, this is an 11.

Read More

Spiders Prompt Mazda to Recall Cars for Software Update

Back in 2011, Mazda had to recall some 65 000 Mazda6 cars in the U.S., Canada, and Mexico because yellow sac spiders—aka Cheiracanthium inclusum—were nesting in “tiny rubber hoses linked to fuel tank systems…[which] could cause pressurization and ventilation problems,” the LA Times reported at the time. In the worst case, Mazda indicated, the spider nests could clog the tubes, or more accurately, the evaporative canister vent lines. The resulting clogs could stress a car’s fuel tank to a point where it cracks, possibly leaks fuel, and potentially ignites.

Mazda installed a spring to the canister vent lines in an attempt to keep the pesky spiders out. In addition, it modified the vehicle’s “Power Control Module software to minimize negative pressure of the fuel tank” for Mazda6s that were still on the production line.  However, in a report (pdf) to the U.S. National Highway Transportation Safety Administration made public last week, Mazda indicated that some spiders had still managed to get through the springs and cause fuel line problems in a number of its customers’ refitted Mazda6s. The automaker did have some good news to report: Its PCM software modification was “effective” in avoiding the possibility of fuel tank cracking, even if a spider’s sac completely clogged the canister vent line.

So Mazda is now going to recall 42 000 U.S.-built Mazda6 cars with 2.5-liter engines from model years 2010 to 2012. These vehicles, built between September 2009 and May 2011, have had the spring installed, but not the PCM software update. Mazda says it will check for spider nests in the canister vent lines, and make needed repairs to any fuel-related parts that may have been damaged as a result of the spiders. It will also reprogram the PCM software to minimize negative pressure in the fuel tank. Affected Mazda6 owners should be getting their recall notices any time now.

Meanwhile, in a more run of the mill recall, Mazda also has issued a global recall last week for 88 000 Mazda3, Mazda6 and CX-5 vehicles manufactured between October 2012 and January 2014 to reprogram its engine control computer. Mazda reported that a “glitch” was found “in the computer program that checks whether the capacitor, a part of the brake energy regeneration system, is functioning properly,” the Economic Times reported.  As a result, the vehicles may not accelerate correctly or even stall. No accidents related to the software problem have been reported, Mazda stated.

The hot water General Motors finds itself in for failing to recall cars with faulty ignition switches and the $1.2 billion hit Toyota just took for hiding what it knew about the problems some of its cars were having with unintended acceleration, may be providing the impetus to make proactive and forthcoming the auto industry’s new bywords. It used to be that car companies were afraid to issue recalls unless absolutely necessary. Perhaps we’ve reached a point where they’re afraid not to.

Nest Labs Suspends Sale of Smoke and Carbon Monoxide Detector until Software Fixed

IT Hiccups of the Week

Last week saw several interesting IT-related bugs, ooftas and malfunctions being reported. We start this week’s edition of IT Hiccups with a potentially serious software issue in the Nest Labs’ Wi-Fi connected smoke and carbon monoxide detector.

Nest Labs announced on its website last Thursday that it was going to suspend sales of its Nest Protect: Smoke + Carbon Monoxide alarm to fix a “unique combination of circumstances that caused us to question whether the Nest Wave (a feature that enables you to turn off your alarm with a wave of the hand) could be unintentionally activated. This could delay an alarm going off if there was a real fire.” In addition to suspending sales of the product, the company is remotely deactivating (where possible) the Next Wave feature until a software fix is found. The company notes that even with the Wave feature disabled, the detector will work.

According to the New York Times, Nest Labs discovered that during laboratory testing the alarm’s software algorithm could misinterpret movement near the detector as being an intentional “wave” command and shut the unit off. The feature, the ability to easily turn off the smoke detector with a wave of the hand from a distance of 0.6 to 2.4 meters away when there was a false alarm, is a major selling point of Nest Protect. But in its current instantiation, the feature may be a life-threatening bug.

Nest Labs stated that it may take two to three months before a software fix is designed, tested and approved by safety agencies in the U.S., Canada, and the U.K. Once complete, the company will send out a software update to reactivate the Wave feature. Nest Labs also emphasized that no one has reported that a detector failed to sound an alarm in event of a fire.

Nest Labs, which is better known for its elegantly-designed Learning Thermostat, was purchased earlier this year by Google for $3.2 billion in cash. There have also been reports of a small number of Nest thermostat customers having wiring-related problems.

The issue with Nest Protect nicely highlights, as IEEE Spectrum’s Automaton editor Erico Guizzo noted to me, the paradox of how software can make things more complicated and worse (i.e., a flawed hand-waving feature), as well as how software can make things more complicated and better (i.e., a remote software update to fix the bug).

The Financial Times of London reemphasized Guizzo’s point in its article on the Nest Protect software bug. It noted that customers and companies can expect these types of incidents to occur more often as “smart” technology is added to simple devices in the emerging rush to embrace the “Internet of Things.” The article also noted that companies’ customer service better be ready to respond when a problem is found. The FT noted, for instance, that just in January, Nest Lab was “forced to increase its customer-support hours after several customers complained about persistent false alarms” that were unrelated to the current software issue.

Eircom Says We’re “Embarrassed” Over the Error, But We Still Want Our Money Now

There are a lot of customers who are angry with Eircom, the largest telecom provider in the Republic of Ireland. Apparently, some 30 000 of its customers will be receiving bills of as much as €500 in their next billing cycle. Eircom is attempting to recoup funds it previously failed to collect because of what it is calling a “system error,” the company told Dublin’s Herald newspaper. That error relates to its implementation of the new Single European Payments Area (SEPA)—a project that is meant to harmonize how retail payments are made and processed across the 34 European countries. The affected customers, which the Dublin Herald describes as “a mix of phone and internet users, mainly those with bundle packages, who make monthly direct debit payments...did not have some or all of their monthly direct debit payments taken from their bank accounts for phone, broadband and TV service since January.” This was despite the fact that customers’ bills showed that the bills had indeed been paid.

Eircom’s director of corporate affairs said that the incident was not only “regrettable,” but that “it’s embarrassing and we're very sorry that it's happened.”

However, Eircom wasn’t too embarrassed or sorry to insist that although it plans to reimburse the failed direct debit fee charge of €18.45, those 30 000 customers would still have to pay all the monies owed the telecom immediately.

Eircon’s hard stand has not endeared it with Ireland’s telecom regulator, which publicly stated that it is demanding a report into the billing error and how it happened. The regulator also wants to know why Eircom failed to inform it that a billing problem had occurred.

In Other News…

New Illinois Teacher Licensing System a Buggy Mess

Western Digital Finally Fixes Multi-day Cloud Outage

One of Melbourne’s Largest Hospital Suffering Booking System Chaos

Georgia Seeks Food Stamp System Fixes

Kansas Municipal Court Mistakenly Mails Arrest Warrant Notices

Maine Homeless Man “Given” $37 000 in ATM Glitch

UK Supermarket Chain Asda Charges £50 for Cabbage

Russia’s Glasnost GPS Experiences Problems

Software Problems Hit French Soldiers’ Pay

Springdale, Arkansas, Residents Receive Tsunami Warning

Nissan Recalls Nearly 1 Million Cars for Air Bag Software Fix

IT Hiccups of the WeekLast week saw a marked increase in the number and types of IT-related errors, bugs and malfunctions being reported.  However, as we have the past few weeks, we again begin this week’s IT Hiccups edition with an auto-related IT issue.

According to the Washington Post, 989 701 Nissan and Infiniti 2013 and 2014 model year vehicles are being recalled to fix a problem in the software that controls air bag deployment for the front seat passenger. They include: 544 000 Altima sedans; 29 000 Leaf electric vehicles; 124 000 Pathfinder SUVs; 183 000 Sentra compact cars; 6700 NV2000 taxis; and 104 000 Infiniti JX35, Q50 and QX60 vehicles.

The Post states that, “Unfortunately, the software installed on the vehicles…may incorrectly determine that the passenger seat is empty when it is, in fact, occupied. If that were to happen, and if the vehicle were subsequently involved in an accident, the passenger‐seat airbags would fail to deploy, increasing the possibility of injury or death.”

A New York Times article says that, “The automaker blamed the sensitivity of the software calibration, particularly when ‘a combination of factors such as high engine vibration at idle when the seat is initially empty and then becomes occupied’ or an ‘unusual’ seating posture are factors.”

Nissan indicates that it was aware of three such accidents, although no fatalities were reported, as a result of the collisions where airbags failed to activate. The company is working on a patch for the software, which should be available in the next few weeks.

In another software-related automobile recall, General Motors is recalling 656 of its 2014 Cadillac ELR vehicles in order to recalibrate software in the electronic control brake module that is part of its electronic stability control system.  GM also announced last week that it was recalling another 824 000 vehicles for issues with defective ignition switches I discussed a few weeks ago. It also announced recalls of another 662 000 other vehicles for various other mechanical issues. GM has now recalled over 4.8 million vehicles since the beginning of year. This week GM and the National Highway Transport Safety Administration will face Congressional hearings on their roles in regard to the delayed ignition switch recall.

Iowa Mayor Unhappy About Disclosure of Crime Reporting Software Problem

Matt Walsh, the mayor of Council Bluffs, Iowa, is reportedly very unhappy that the state government was told by a local county board member that the new software system the local police use to record crime statistics was flawed. According to a World‐Herald News Service article, the Council Bluffs police department began using software last year provided by the Iowa Department of Transportation to enter local crime statistics. However, a software programming error “upgraded” many of the crimes entered. For instance, a simple assault reported by the police instead got changed by the software into a more serious aggravated assault.

The problem with the new software helped explain why Council Bluffs was recently listed as No. 56 on the most 100 most dangerous places to live in the U.S., the World-Herald said. Now, one would think the city’s mayor would be happy about the flaw being discovered and his city’s reputation as a criminal haven being rehabilitated. Yet, the Des Moines Register states that Mayor Welsh was incensed. Why? Well, the state provides crime enforcement grant funding to the city based on crime statistics, and the mayor is now worried the city may have to return some of the state grant money it received.

The Register quotes the mayor as saying, “What kind of individual runs to the state and tattles? …This money is to fight crime.” The mayor also claims that since the police department originally reported its crimes correctly, it isn’t the city’s fault that the software system provided by the state screwed up, so the money it previously received from the state is rightfully the city’s.

Hmm… maybe the teenager who found $31 000 mistakenly deposited in his account by the First Citizens Bank in Hull, Georgia a few weeks ago and who decided to spend it should have used the same ethical reasoning, instead of lying and pretending that the money was deposited on purpose as part his share of an inheritance from his grandmother’s estate.

Maryland Throws in the Towel on its Affordable Care Act Website

Today is the last day to sign up for health insurance under the Affordable Care Act (with some exceptions) until the next open enrollment season. To say the least, the introduction of the ACA on 1 October 2013 has been interesting to watch from an IT system risk mismanagement perspective, both at the federal and state level. Even now, the federal site still is reporting access issues.

However, five states— Maryland, Massachusetts, Nevada, Oregon and Vermont— have given star performances in how not to create a state ACA website and supporting infrastructure systems. An Associated Press article provides a decent overview of the health insurance exchange implementation problems encountered in each state, as does another story at VTDigger.org that examines the issues confronting the Massachusetts and Vermont exchanges in greater depth.

However, for sheer incompetence, Maryland’s ACA implementation debacle really stands out (although Oregon's comes in a close second). After spending at least $125.5 million, Maryland has decided to basically abandon its exchange, the Washington Post reported. The state reportedly will be using the exchange system Connecticut has developed and is eager to sell to other states, which seems to work better than most.

Earlier this month, the U.S. Department of Health and Human Services launched an investigation into what went wrong with the Maryland's health insurance exchange. However, it is unlikely that any results will be published before the state primary elections in June. The reason is that Democratic Lt. Gov. Anthony Brown, who once proudly proclaimed that he was in charge of the Maryland ACA implementation, is running for governor, and I doubt the federal government wants to be seen as possibly interfering with the election. To say that Brown was asleep at the health exchange switch would be to assume, given his role in the unfolding debacle and his very recent claims that the exchange implementation is a “success,” that he knew where the switch was in the first place.

In Other News…

FAA Instructs Boeing to Fix Critical 747-8 Software Flaw

System Issues Delay Bombardier Learjet 85 First Flight

Illinois Demands DUI Offenders’ Pay Fines Years after Computer Error

Soyuz Spacecraft Suffers Software Issue on Trip to Space Station

Northern Ireland Hospital Staff hit by £400k Payroll Shortfall

Hundreds of Irish Motorists Receive Fines after M50 Toll Glitch

Price Glitch Charges £450 for Loaf of Bread in Wolverhampton, England

Denver-based Public Service Credit Union Experiences Four Days of Computer Problems

Allied Irish Bank Customers Double‐charged


Software Testing Problems Continue to Plague F-35 Joint Strike Fighter Program

The U.S. General Accountability Office (GAO) earlier this week released its fifth annual report on the state of the F-35 Lightning II, aka the Joint Strike Fighter (JSF), aka the “most costly and ambitious” acquisition program ever. What the GAO found was foretold by a report earlier this year by the Department of Defense’s Director of Operational Test and Evaluation. The upshot: the F-35 operational and support software development continues to be the major obstacle to the program's success.

In addition, the GAO report states that the projected cost of acquiring the planned 2443 F-35 aircraft (which comes in three flavors) threatens to consume some 20- to 25 percent of annual defense program acquisition funds for the next twenty years or so. The GAO doesn’t explicitly say so, but the operations and maintenance costs of the program—currently estimated to be between $800 billion and $1 trillion dollars or more over the next 50 years—will also consume a significant chunk of DoD’s annual weapon-system related O&M budget as well.

The GAO report states that, “Challenges in development and testing of mission systems software continued through 2013, due largely to delays in software delivery, limited capability in the software when delivered, and the need to fix problems and retest multiple software versions.”  Further, the GAO notes that the F-35 program continues to “encounter slower than expected progress in developing the Autonomic Logistics Information System (ALIS),” which is the F-35’s advanced integrated maintenance and support system (pdf). In the latter case, Lt. Gen. Christopher Bogdan, the F-35 Program Executive Officer, conceded last month that the ALIS system was “way behind” where it should be and was “in catch-up mode.” This, the GAO indicates, was apparently at least partly because of a lack of testing facilities that remains a problem years after ALIS development began.

The GAO notes that as a result of the on-going software problems with the aircraft's mission and support systems, F-35 program officials and contractors alike believe that software development will continue to be the F-35 program’s “most significant risk area.”

Software-testing related issues involving the development and fielding mission systems were the main thrust of this year’s GAO report.  The F-35, you may recall, is delivering its mission capabilities in a series of  incremental “software blocks,” designated as Block 1A/B, Block 2A, Block 2B, Block 3i, and Block 3F.  Each block builds on the mission capability developed in the preceding block. As described by the report, “Blocks 1 and 2A provide training capabilities and are essentially complete, with some final development and testing still underway. Blocks 2B and 3i provide initial warfighting capabilities and are needed by the Marine Corps and Air Force, respectively, to achieve initial operational capability. Block 3F is expected to provide the full suite of warfighting capabilities, and is the block the Navy expects to have to achieve its initial operational capability.” According to Flightglobal, a software Block 4 is being planned as an eventual mission capability upgrade for which development will begin late this year or more likely early next.

However, the GAO report states that, “Developmental testing of Block 2B software is behind schedule and will likely delay the delivery of expected warfighting capabilities,” required by the Marines for their variant of the F-35  (the F-35B) that is scheduled for delivery by July 2015. As of January of this year, “the program planned to have verified the functionality of 27 percent of the software’s capability on-board the aircraft, but had only been able to verify 13 percent,” says the GAO report. In more than a bit of an understatement, the GAO says that, “This leaves a significant amount of work to be done before October 2014, which is when the program expects to complete developmental flight testing of this software block.”

The GAO notes—and seems to agree with—the Operational Test and Evaluation Director's view that a more realistic estimate for when Block 2B’s software functional verification will be completed is sometime closer to November 2015. The report also notes that such a delay would create a knock-on effect to the subsequent F-35 software blocks as well, increasing the cost of the acquisition, not to mention delaying the planned initial operational capability (IOC) of the aircraft (2016 for the Air Force F-35As, and 2018 for the Navy F-35Cs).

Yet, despite everything it saw, the GAO indicates that the F-35 program office and contractors, and especially the Marines, seem to be all whistling along to Bobby McFerrin’s song, “Don’t Worry, Be Happy.” The GAO states that, “Program and contractor officials have stated that while they recognize that the program faces software risks, they still expect to deliver all of the planned F-35 software capabilities to the military services as currently scheduled.” Why do they think so? Why, they are now going to introduce new approaches to gain “testing efficiency.” The plan: mainly by using “test results from one F-35 variant to close out test points for the other two variants in instances in which the variants have common functions.”  However, Bloomberg News quoted a recent RAND assessment of the F-35 program as stating that, “As of this writing, it is not clear how common the mission systems, avionics, software and engine will be among the three service variants,” so how much efficiency will in reality be gained remains to be seen.

In fact, in testimony before Congress yesterday, Lt. Gen Bogdan was reported by Reuters as saying he was “pretty confident” that Block 2B software would be delivered within 30 days of its current target date to allow the Marines to get to initial operational capability by July next year, as the software is “80 percent complete.” However, Bogdan also indicated that he was not as confident that even ten Marine F-35Bs would be IOC ready given that most of the 40-plus Marine F-35Bs will require some 96 engineering modifications by then.

Lt. Gen. Bogdan also disclosed at the hearing that “Block 3F [software] is dependent upon the timely release of Block 2B and 3I, and at present, 3F is tracking approximately four to six months late without taking steps to mitigate that delay.”

One does hope the program’s Block 2B software testing efficiency strategy is successful, since the GAO indicates the F-35 is scheduled to undergo operational testing in June of next year, “to determine that the aircraft variants can effectively perform their intended missions in a realistic threat environment.” If the new testing strategy is not successful, the GAO's view is that the cost of the F-35 acquisition and its future sustainment costs will just keep on escalating.

In response to the GAO report, the F-35 program office has agreed to deliver to Congress an assessment of the “risks of delivering required capabilities within the stated initial operational capability windows for each military service.” The GAO wants that assessment completed and the risks reported by July 2015, but the program hasn’t committed itself to any specific timetable to deliver a detailed assessment. As a Marine Corps Times article seems to suggest, future disclosures on the part of the program office concerning the risks of possible program schedule slips or cost increases will more than likely happen only in piecemeal fashion and by accident.

Of course, even if the F-35 Block 2B software is late—or one or more of the other software blocks are delayed for that matter—it really presages very little change in the general future direction of the program. Why? Well, in a CBS News 60 Minutes interview in February, Lt. Gen. Bogdan was asked, “Has the F-35 program passed the point of no return?” to which he replied, “I don't see any scenario where we're walking back away from this program.”

The GAO is officially scheduled to conduct one more annual review of the F-35 acquisition. The only purpose of it that I can see is merely to warn current and future U.S. taxpayers, many who are not yet born, how much more money they will have to shell out for the next 50 years or more.

Photo: U.S. Department of Defense


Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Willie D. Jones
Load More