We start this week’s installment of IT-related “ooftas” with United Airlines' third “computer outage” of the year. This time the problem began yesterday around 0830 EST and ended near 1030 EST and according to the AP, involved the system used by dispatchers at the company's operations center in Chicago to communicate critical information such as aircraft weight and fuel loads to all of United’s operating locations around the world. A United spokesperson told the AP that “the airline has identified the specific problem, and said it won't happen again.”
Okay, regular United flyers, you can wipe those tears of laughter away now.
United told Reuters that the outage delayed less than 200 of the 5 679 United flights scheduled for yesterday (interestingly, United told the AP it was 250 flights), but I suspect that doesn’t count the number of flights that experienced knock-on effects from those 200 or so being delayed. The outage doesn’t help United’s quest for winning back business customers who have fled United because of its IT system problems this year. Just three weeks ago, CEO Jeff Smisek said the company, which took a major financial hit because of its earlier botched IT-system integration effort, fully expected “to earn back those customers who took a detour” around the airline. That just got harder after yesterday.
United wasn’t the only airline to suffer from an IT-related outage this past week. Last Saturday morning, 10 November, the Navitaire reservation system used by Jetstar, Virgin Australia, Tiger Airways and Rex went down for three hours due to a power failure at its data center in Sydney, Australia, the Herald Sun reported. Both Virgin Australia and Jetstar are contemplating whether to demand compensation from Navitaire, which is owned by Accenture. Last year Navitaire reached a confidential settlement with Virgin for damages after a major Navitaire meltdown negatively affected Virgin Blue flights for days in 2010.
Apparently, for the third week out of the last four, U.K. retailer Tesco suffered yet another product pricing glitch, this time affecting the online prices for its new London-area “exclusive-to-Tesco” organic fruit and vegetable boxes called Soil & Seed. According to a story in The Grocer, small, medium and large vegetable boxes were being offered for £5, £10 and £15 instead of the true price of £9, £13.50 and £18. Tesco said that it would honor the mistaken prices for customers who had ordered the vegetable boxes before the price glitch was corrected. However, there were no reports of stampedes of shoppers stocking up on the vegetable boxes as in previous Tesco pricing glitches involving beer, wine or cheese.
Also making a reappearance on the glitch list after a short time away was another trading glitch at the New York Stock Exchange. This time, a hardware problem forced the suspension Monday of trading in 216 stocks for the day, a story at the Wall Street Journal reported. The WSJ also reported that a “technical glitch” caused trading on the Mexican Stock Exchange to be suspended twice Monday, while Bloomberg News reported that a “software error” halted trading in Russian rubles late Wednesday.
Next, commuters on the new, £1.5 billion S Stock London Underground Metropolitan line trains have been ending up at stations they weren’t expecting due to a software error. According to a story in the Buckinghamshire Examiner, when passengers boarded on what was “advertised as a Chesham train only [they] ended up in Amersham without warning and vice versa.”
The story said that “Transport for London (TfL)… is working with Derby-based Bombardier Transportation, which makes the trains, to resolve the issue,” which is being described as a “teething problem” with the digital destination boards software which shows the destination of the subway train.
More intriguing was a statement in the article by the Chairman of the Federation of the Metropolitan Line Users' Committees who said, “It's pretty rare, it only happens to about five per cent of journeys but I'm pleased it's being sorted.” An interesting definition of the term “rare.”
While arriving at the wrong destination may be annoying, it isn’t as stressful as receiving a text and voicemail from the police telling you that there has been a shooting on campus and that the suspect is still at large. This is what happened at Michigan’s Oakland University when a regularly scheduled campus Police Department test of its emergency communications procedures went wrong and a technical error caused the system to accidentally sent out a pre-recorded real emergency message instead of the test message, the campus paper reported. The error was caught quickly, but not before students started to panic. The campus police said, “We regret the error and any confusion and inconvenience it may have caused,” after which it promised it wouldn’t happen again.
Finally, Maine’s Office of Program Evaluation and Government Accountability released its report (pdf) into the software bug in Maine's Integrated Health Management System (MIHMS) that led to roughly 19 000 people to continue to get their medical bills paid by Medicaid even though they were ineligible from September 2010 to March 2012 and costing the state over $10.6 million.
The short version of the report is that the bug was found in August 2010, and joined the list of 89 others also listed as severe. A lack of resources and a lack of a method to prioritize which bugs should be worked on first meant it wasn’t addressed until March 2011. Correcting the bug turned out to be much harder than expected, taking nearly a full year to fix, test and implement. The situation was exacerbated by a lack of communication between the IT department, which recognized the increasing financial impact of the uncorrected bug, and executive management, which was kept in the dark by IT about the ever increasing costs. This led to the “surprise” $10.6 million in unanticipated Medicaid payments being "discovered" earlier this year.
Maine's Department of Health and Human Services, which is in charge of MIHMS, says it is now “changing its organizational culture to create an atmosphere of healthy communications and transparency” to avoid a similar issue in the future. Funny that: I would bet before this incident happened, DHSS would have proudly proclaimed that they already had an organizational culture that fostered an atmosphere of healthy communications and transparency.
Contributing Editor Robert N. Charette is an acknowledged international authority on information technology and systems risk management. A self-described “risk ecologist,” he is interested in the intersections of business, political, technological, and societal risks. Along with being editor for IEEE Spectrum’s Risk Factor blog, Charette is an award-winning author of multiple books and numerous articles on the subjects of risk management, project and program management, innovation, and entrepreneurship. A Life Senior Member of the IEEE, Charette was a recipient of the IEEE Computer Society’s Golden Core Award in 2008.