Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

IT Hiccups of the Week: Sutter Health’s $1 Billion EHR System Crashes

Computer issue bites Japanese rocket launch, Nasdaq partially blames software bug for outage

4 min read

IT Hiccups of the Week: Sutter Health’s $1 Billion EHR System Crashes

After a torrid couple of months, last week saw a slowdown in the number of reported IT errors, miscalculations, and problems. We start off this week’s edition of IT Hiccups with the crash of a healthcare provider’s electronic health record system.

Sutter Health’s Billion Dollar EHR System Goes Dark

Last Monday, at about 0800 PDT, the nearly US $1 billion EPIC electronic health record (EHR) system used by Sutter Health of Northern California crashed. As a result, the Sacramento Business Journalreported, healthcare providers at seven major medical facilities, including Alta Bates Summit Medical Center facilities in Berkeley and Oakland, Eden Medical Center in Castro Valley, Mills Peninsula Health Services in Burlingame and San Mateo, Sutter Delta in Antioch, Sutter Tracy, Sutter Modesto and affiliated doctor’s offices and clinics, were unable to access patient medications or histories.

A software patch was applied Monday night, and EHR access was restored. Doctors and nurses no doubt spent most of the day Tuesday entering in all the handwritten patient notes they scribbled on Monday.

It still is unclear whether the crash was related to a planned system upgrade that was done the Friday evening before the crash, but if I were betting, I would lay some coin on that likelihood.

Nurses working at Sutter Alta Bates Summit Hospital have been complaining for months about problems with the EHR system, which was rolled out at the facility in April. Nurses at Sutter Delta Medical Center have also complained that hospital management there has threatened to discipline nurses for not using the EHR system; its system went live about the same time as Alta Bates Summit's, but for billing for chargeable items. Sutter management said that it was unaware of any of the issues the nurses were complaining about, and that any complaints they might have lodged were the result of an ongoing management-labor dispute.

Sutter is now about midway through its EHR system roll-out, an effort it first started in 2004 at a planned cost of $1.2 billion and completion date of 2013. It later backed off that aggressive schedule, and then “jump started” its EHR efforts once more in 2007. Sutter plans to complete the roll-out across all 15 of its hospitals by 2015 at a cost now approaching $1.5 billion.

Hospital management said in the aftermath of the incident, “We regret any inconvenience this may have caused patients.” It did not express regret to its nurses, however.

Computer Issue Scraps Japanese Rocket Launch

Last Tuesday, the launch of Japan’s new Epsilon rocket was scrubbed with 19 seconds to go because a computer aboard the rocket “detected a faulty sensor reading.” The Japan Aerospace Exploration Agency (JAXA) had spent US $200 million developing the rocket, which is supposed to be controllable from conventional desktop computers instead of massive control centers. This added convenience has resulted from the extensive use of AI to self-perform status-checks.

The Japan Times reported on Thursday that the problem was traced to a “computer glitch at the ground control center in which an error was mistakenly identified in the rocket’s positioning.”

The Times stated that, “According to JAXA project manager Yasuhiro Morita, the fourth-stage engine in the upper part of the Epsilon that is used to put a satellite in orbit, is equipped with a sensor that detects positioning errors. The rocket’s computer system starts calculating the rocket’s position based on data collected by the sensor 20 seconds before a launch. The results are then sent to a computer system at the ground control center, which judges whether the rocket is positioned correctly. On Tuesday, the calculation started 20 seconds before the launch, as scheduled, but the ground control computer determined the rocket was incorrectly positioned one second later based on data sent from the rocket’s computer.”

The root cause(s) of the problem are still unknown, although it is speculated that it was a transmission issue. JAXA says that it will be examining “the relevant computer hardware and software in detail.” The Times reported on Wednesday that speculation centered on a “computer programming error and lax preliminary checks.”

JAXA President Naoki Okumura apologized for the launch failure, which he said brought “disappointment to the nation and organizations involved.” A new launch date has yet to be announced.

Nasdaq Blames Software Bug For Outage

Two weeks ago, Nasdaq suffered what it called at the time a “mysterious trading glitch.” The problem shut down trading for three hours. After pointing fingers at rival exchange NYSE Arca, it admitted last week that perhaps it wasn’t all Arca’s fault after all.

A Reuters News story quoted Bob Greifeld, Nasdaq's chief executive, as saying Nasdaq’s backup system didn’t work because, “There was a bug in the system, it didn't fail over properly, and we need to work hard to make sure it doesn't happen again.”

However, Greifeld didn’t fully let Arca off the hook. A story at the Financial Times said that in testing, Nasdaq’s Securities Information Processor (SIP), the system that receives all traffic on quotes and orders for stocks on the exchange, “was capable of handling around 500,000 messages per second containing trades and quotes. However, in practice, Nasdaq said repeated attempts to connect to the SIP by NYSE Arca, a rival electronic trading platform, and streams of erroneous quotes from its rival eroded the system’s capacity in a manner similar to a distributed denial of service attack. Whereas the SIP had a capacity of 10,000 messages per data port, per second, it was overwhelmed by up to more than 26,000 messages per port, per second.”

Nasdaq said that it was now looking at design changes to make the SIP more resilient.

A detailed report looking into the cause of the failure will be released in about two weeks or so.

Of Other Interest…

Computer Error Causes False Weather Alert and Cancelled Classes at Slippery Rock University

UK’s HSBC Bank Suffers IT Glitch

NC Fast Computer System Can’t Shake Processing Problems

North Carolina DMV Computer System Now Back to Normal

Australia’s Telstra Faces Large Compensation Bill for Internet Problems

Data Glitch Hits CBOE Futures Exchange

China Fines Everbright Security US $85 million Over Trading Error

Photo: iStockphoto

The Conversation (0)