There was a useful summary story over the weekend in the Washington Post that lists several still unanswered questions about the Washington DC Metro crash two weeks ago. Several of these questions I have touch upon in previous posts on the crash.
The questions raised are:
(1) What caused the track circuit to malfunction? Is the Wee-Z bond used in the Metro's train rail circuits to blame?
The Wee-Z bond is still the primary culprit, but so far, other components in the track circuit such as the track side cables have not been ruled out. Metro has checked about 85% of the 3,000 rail circuits used in the system and have not found any problems with them.
(2) Is the work crew that made the repair to blame?
One question I have had was why the suspect rail track circuit was "repaired" 5 days before the crash. The reason is still a bit uncertain. One news report says that Metro claimed that the reason why it was repaired was routine maintenance. After the maintenance, the track circuit was checked out and passed its tests.
Yet another report says that the track circuit was repaired after - during routine maintenance work on the track circuit - track engineers found the track circuit not to be working correctly. Furthermore, it is alleged, after being repaired the track circuit continued to exhibit problems. Moreover, it is claimed, this information was conveyed to Metro supervisors and entered into Metro's maintenance database.
If this is true, it would indicate a major lapse in operational safety. In fact, in Sunday's Washington Post, there was an opinion piece by Jim Hall, former chairman of the National Transportation Safety Board (NTSB) from 1994 to 2001 questioning Metro's safety culture, something that has been a source of contention between the NTSB and Metro for over a decade.
(3) When did the operator see the train about to be collided with, and when did the operator first hit the brakes?
The first part of this question hopefully will be answered within the next two weeks by the NTSB as it conducts tests but the second part may take quite a bit longer to resolve. Indications are about 5 to 6 seconds before impact if the train was at full speed.
(4) Why was the operator of the train collided with operating his train in manual and not automatic mode?
The Metro policy at the time of the crash was for train operators to operate their trains in automatic or computer control during rush hour. The only reason not to do so is when an operator suspects there is a braking problem or the train is not somehow operating properly. However, train operators are supposed to get permission from Metro system's central controllers before going from automatic to manual mode.
So, did that operator notice an anomaly in the crash area which he had just traversed? Did he get permission to go from automatic to manual mode from the system controllers? And does this information get communicated to other train operators?
As I read the Post story and reviewed the crash details so far published, a couple of other question also came to mind.
For example, how are train operators trained to quickly recognize an automation problem with a train's operation and take corrective action?
What is an acceptable level of risk of a crash? No system is completely safe, and moving towards zero-risk is not free. The Metro Board took a decision not to replace the older, less safe rail cars because of the cost and the perception that the likelihood of a fatal crash was very low. Was that an acceptable or unacceptable decision, and was the decision process open and transparently made? And was the decision based on assumptions about the safety process and culture in Metro that were not based in fact?
In addition, why didn't Metro management take the risk mitigation efforts it is now taking such as placing older cars in between new, sturdier rail cars when the NTSB recommended years ago the older cars be replaced because they were unsafe?
As a side note, there was story late last week in the Financial Times of London about the new subway signaling system being introduced into the London Tube. I wonder whether the DC Metro crash is raising any questions about the system now being put in and tested.
Finally, two monorail trains collided at Disney World early Sunday killing one of the drivers and injuring at least 5 others. It is unclear what the cause is and whether another automation error may be involved.
Robert N. Charette is a Contributing Editor to IEEE Spectrum and an acknowledged international authority on information technology and systems risk management. A self-described “risk ecologist,” he is interested in the intersections of business, political, technological, and societal risks. Charette is an award-winning author of multiple books and numerous articles on the subjects of risk management, project and program management, innovation, and entrepreneurship. A Life Senior Member of the IEEE, Charette was a recipient of the IEEE Computer Society’s Golden Core Award in 2008.