Two Capacitors and Poor Software Design Cause of Major NSW Rail Outage Last Month

On the morning of 12th of April at 0740, a problem at one of the signal boxes at the New South Wales (NSW) RailCorp's Sydenham Advanced Train Running Information Control System (ATRICS) signal box complex caused the loss of control for points, routes and controlled signals for all areas under the control of the complex. In short, all train signals turned red on six CityRail lines into and out of Sydney's Central Business District (CBD). As a result, for several hours over 100,000 passengers became stranded either on trains or waiting for trains.

The Sydenham signal box, after being turned off and then back on again, began to operate correctly at about 0915, but the train system wasn't completely back to normal operations until after 1600.

All in all, some 847 trains were delayed and another 240 trains were canceled during the day because of the single signal box problem. At the time, the commuting nightmare was blamed on a computer glitch.

Well, a report (PDF) that was released last week by NSW RailCorp provides more insight into the exact nature of the "glitch." Apparently, two faulty capacitors along with a poor software design were to blame.

The report states that:

"The ATRICS system is a computerised system that utilises a Local Area Network (LAN) and a Wide Area Network (WAN) to communicate between the various devices that make up the system. The network is protected from other networks in order to prevent external cyber attacks such as hacking, virus etc. The Sydenham LAN is made up from a combination of switches, routers and firewalls in order to achieve the required availability and integrity. The LAN comprises four switches which are connected in a ring topology to provide device and network fault tolerance and to be adaptive to network changes."
"At 7:36:52 on the 12th April 2011, one of the network switches that forms part of the ATRICS LAN at Sydenham (Sydhm_sw1) detected that the adjacent switch was no longer communicating. At 7:37:10 the same network switch detected that the new communicating switch had resumed communicating. This pattern repeated regularly for sw1 and for other switches connected to the network. This pattern indicates that there was not a complete failure but that one of the network switches was cycling from a failed to an operational state. As a result the Sydenham LAN became caught in a cycle where it was continually trying to reconfigure itself to address the changing state of the network."

RailCorp determined that "... the trigger for the event was the failure of two electrolytic capacitors in the Sydenham LAN switch sydhm_sw2. Due to the nature of the capacitor failure, the switch partially failed which placed the whole LAN into a partially failed state."

Full recovery was only possible by turning the system off and restarting it.

In addition,

"[While] the root cause of the incident was the failure of the Sydenham LAN however; the major contributors to the duration of the outage were the inability of the ATRICS software to manage the scenario created by the failure of the LAN, and the slowness of the Sydenham LAN, which are still under investigation."

RailCorp went on to say that the incident exposed a design weakness in the Sydenham LAN and the ATRICS software, and that it should be a priority to mitigate/eliminate the risk.

The switch that failed had been used for over 8 years without this problem showing up.

fail safe it australia software networks mass transit railroads

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Two Capacitors and Poor Software Design Cause of Major NSW Rail Outage Last Month

Fixing design vulnerability now a high priority

Video Friday: SpaceHopper

LED Touchscreen Is Also a PV Charger

50 Years Later, This Apollo-Era Antenna Still Talks to Voyager 2

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Two Capacitors and Poor Software Design Cause of Major NSW Rail Outage Last Month

Fixing design vulnerability now a high priority

Video Friday: SpaceHopper

LED Touchscreen Is Also a PV Charger

50 Years Later, This Apollo-Era Antenna Still Talks to Voyager 2

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic