User and Computer-related Errors Involved in Two Australian Aircraft Incidents

Over the past few days, the Australian press has been reporting (e.g., here and here) on the Australian Transportation Safety Bureau's (ATSB) publication of its final reports into two aircraft incidents, one dating from 2008 and the other from 2009.

The 2008 incident involved Qantas flight QF72 traveling from Singapore to Perth which was forced to make an emergency landing at Learmonth air base in Western Australia (about 1,100 kilometers northeast of the state capital Perth) after the Airbus A330-303 unexpectedly and rapidly climbed and then lost altitude. Some 110 of the 303 passengers and nine of the 12 crew members were injured, a dozen of them seriously.

According to the 313-page ATSB final report, the Airbus was at its cruising altitude of 37,000 feet when:

"... one of the aircraft’s three air data inertial reference units (ADIRUs) started outputting intermittent, incorrect values (spikes) on all flight parameters to other aircraft systems. Two minutes later, in response to spikes in angle of attack (AOA) data, the aircraft’s flight control primary computers (FCPCs) commanded the aircraft to pitch down..."

"Although the FCPC algorithm for processing AOA data was generally very effective, it could not manage a scenario where there were multiple spikes in AOA from one ADIRU that were 1.2 seconds apart. The occurrence was the only known example where this design limitation led to a pitch-down command in over 28 million flight hours on A330/A340 aircraft, and the aircraft manufacturer subsequently redesigned the AOA algorithm to prevent the same type of accident from occurring again."

You may remember, a [different] problem with ADIRUs caused uncommanded aircraft movements in Malaysia Airlines Flight 124's Boeing 777-200 in August 2005.

The ATSB report further stated that:

"Each of the intermittent data spikes was probably generated when the LTN-101 ADIRU’s central processor unit (CPU) module combined the data value from one parameter with the label for another parameter. The failure mode was probably initiated by a single, rare type of internal or external trigger event combined with a marginal susceptibility to that type of event within a hardware component. There were only three known occasions of the failure mode in over 128 million hours of unit operation. At the aircraft manufacturer’s request, the ADIRU manufacturer has modified the LTN-101 ADIRU to improve its ability to detect data transmission failures."

Notice the words "probably initiated" above. The ATSB could not definitely say  what caused the problem, although it did believe that it knew what probably didn't cause it:

"Some of the potential triggering events examined by the investigation included a software ‘bug’, software corruption, a hardware fault, physical environment factors (such as temperature or vibration), and electromagnetic interference (EMI) from other aircraft systems, other on-board sources, or external sources (such as a naval communication station located near Learmonth). Each of these possibilities was found to be unlikely based on multiple sources of evidence. The other potential triggering event was a single event effect (SEE) resulting from a high-energy atmospheric particle striking one of the integrated circuits within the CPU module. There was insufficient evidence available to determine if an SEE was involved, but the investigation identified SEE as an ongoing risk for airborne equipment."

The ATSB report also identified a number lessons learned "... for the manufacturers of new complex, safety-critical systems to consider..." including a heightened awareness during system safety assessments (SSAs) and other design evaluation activities:

"... that ADIRUs and similar types of equipment can generate a wide range of patterns of incorrect data, including patterns not previously experienced."

"Where practicable for safety-critical functions, SSA and other design evaluation activities should consider the effects of different values of system inputs in each mode of operation, particularly during transitions between modes"

In addition, the SSAs need to consider single error effects, especially when the aircraft electronic systems use high-density integrated circuits. The ATSB report says that:

"Designers should consider the risk of SEE and include specific features in the system design to mitigate the effects of such events, especially in systems with a potentially significant influence on flight safety."

SEEs were speculated to be the cause of the Toyota unintended acceleration, but this has never been proven.

The ATSB report has a nice section in it on the risk of electromagnetic interference (EMI) on aircraft operations, which is getting a lot of press play lately.

An animation of the incident can be found on the ATSB web site.

The other ATSB final report concerned a United Arab Emirates Airbus A340-500 aircraft taking off from Melbourne, Australia the evening of 20th of March 2009 on its way to Dubai which struck its tail on the runway three times and a grassy area twice before being able to ascend. The aircraft sustained substantial damage as well as damaged some airport lighting and the airport's instrument landing system. The aircraft immediately returned to the airport for an emergency landing which it did without incident.

As stated in the 176-page ATSB report:

"The pre-departure preparation included the use of an electronic flight bag laptop computer (EFB) to calculate the performance parameters (take-off reference speeds, and flap and engine settings) for the takeoff from runway 16. That calculation relied on the manual entry into the EFB of several pieces of data, including the aircraft’s take-off weight."

"The take-off weight of the aircraft (361.9 tonnes) was available from the aircraft’s flight management and guidance system (FMGS). The crew’s intention was to take this figure, add a 1-tonne allowance for last-minute weight changes, and enter the result (362.9 tonnes) into the EFB."

"When entering the take-off weight into the EFB, however, the first officer inadvertently entered 262.9 tonnes instead of the intended 362.9 tonnes and did not notice that error. The incorrect weight and the associated performance parameters were then transcribed onto the flight plan for later reference."

The captain, who was distracted during pre-flight checks, also did not detect the error when he was first supposed to confirm the figure. There were several other opportunities to discover the error, but these too were wasted, the ATSB report states.

As a result, during takeoff "the aircraft did not respond as expected," but fortunately the crew was able to successfully ascend by applying full thrust to the aircraft's engines, even after the multiple tail strikes damaged the aircraft.

The ATSB also conducted a research study in conjunction with this incident to see how prevalent "simple data calculation or entry error by the flight crew(s)" were reported in take-off accidents. It found that there were "... 20 international and 11 Australian accidents and incidents (occurrences) identified between 1 January 1989 and 30 June 2009 where the calculation and entry of erroneous take-off performance parameters, such as aircraft weights and ‘V speeds’ were involved." The ATSB speculates that there were probably other incidents, but  were never formally reported.

The ATSB found that if such a data entry mistake is made and is not caught by the preflight procedures currently in place to uncover them, there is virtually no way for aircrews to detect their mistake during take-off other than how the takeoff "feels" in comparison to previous takeoff experiences. By the time the aircrew figures out that "something is not right," it may be too late to recover.

The ATSB reports states the Bureau is now investigating takeoff acceleration monitoring and alerting systems as a way to mitigate this data entry risk. Airbus has told the ATSB that it is undertaking a feasibility study into what it would take to develop such a system, and hopes to have one developed by 2015 if it is feasible to do so.

I wonder if it will be called at least unofficially as the FFAS or the fat finger alert system.

You can look at an animation of the Emirates incident at the ATSB web site.

Related Stories

Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Contributors

 
Contributor
Willie D. Jones
 

Newsletter Sign Up

Sign up for the ComputerWise newsletter and get biweekly news and analysis on software, systems, and IT delivered directly to your inbox.

Advertisement
Advertisement