Avoiding Future Disasters and NASA's Memory Problem

Three elderly male panelists and one younger female moderator sit on a stage above an audience. Another eldery man can be seen on a large video screen.
Photo: Johnson Space Center/NASA

50 years ago, on January 27th, 1967, three astronauts climbed into an Apollo capsule perched atop a Saturn 1B, the smaller cousin of the Saturn V that would be later used to send astronauts to the moon. The three astronauts—Gus Grissom, a Mercury program veteran, Ed White, the first American to walk in space, and Roger Chaffee, a spaceflight rookie—were not planning on going anywhere. They were doing a test: the goal was to simply operate the spacecraft while disconnected from ground support equipment as if it was in orbit, not just sitting on a launch pad at Kennedy Space Center in Florida. The capsule was sealed up, and the astronauts began working through the test procedures. A few hours later, an electrical fire broke out and killed the crew before they could escape the capsule.

Last week, NASA held many commemorations for the anniversary of the Apollo 1 fire. But a forward-looking event at the astronaut base at the Johnson Space Center in Houston stands out as particularly apposite. In particular, a panel of emeritus experts discussed what space workers must stop forgetting about what the Apollo 1 fire—and the subsequent 1986 Challenger and 2003 Columbia space shuttle disasters—has to teach.

The veteran program workers discussed their insights in front of a packed house, and the emcee—a freshly minted astronaut from the class of 2012—drove the need for such reminders home with a simple request. After asking those in the audience who had worked on Apollo to rise (about 5 percent did, to applause) she asked for those who had come to work after 2003 (and so hadn’t been present for any of the disasters) to rise next.  Almost half of the gathering did so.

Although the immediate source of disaster was different in each case—a fire in a cabin filled with pure oxygen for Apollo 1, a cracked O-ring in a booster for Challenger, and an insulating foam strike on a heat shield for Columbia—“The commonality of the causes of all three catastrophes is sobering,” said panelist Gary Johnson.

Johnson is a retired safety expert who, as a 27-year-old electrical engineer in 1967, had been thrown into the heart of the Apollo 1 fire investigation. He had been the only electrical specialist at the console in the control center in Houston during the routine test, had noticed a sudden “Main Bus A/B” alarm light, then heard the shouts of ‘Fire!’ Within minutes, Johnson recalled, the control room doors were locked, those present were given one phone call to tell their families they’d not be home that night, and the teams plunged into capturing all of the data that had been flowing to Houston from the test up to the moment of the catastrophe.

Within days Johnson was crawling around inside the burnt-out capsule in Florida, examining the remains of cable trays and other wiring. He also was meticulously poring over the close-out photos of the cabin prior to the test run, identifying frayed or even dangling insulation on cabling. And he helped set up test fires in a simulated capsule with wiring matching what he saw had been inside Apollo-1, in the same high oxygen environment—and remembers being shocked by the ferocity of the flames that a single spark could trigger. 

Johnson described how the fundamental design change to the Apollo spacecraft that was made in the wake of the fire—aside from a quick opening hatch and the decision to never to fill the cabin at full pressure with pure oxygen—was installing secure cable trays and conduits to prevent chaffing of the insulation around wires. “Gemini [spacecraft] were constructed with all the wiring outside the crew cabin,” he recalled, “but in Apollo the new contractor ran wiring bundles all over the walls and floor of the spacecraft, wrapped in taped-on insulation bought at a local hardware store.” The wires were supposedly protected by temporary panels installed for maintenance, but it was haphazard at best. Grimly, post-fire analysis found too many potential sparking sites to actually even guess which one had been the fire starter. 

For the Apollo 1 fire, it was clear that the kind of tests that Johnson had performed after the fatal disaster should have been performed by any prudent design team before the astronauts climbed into the capsule. The “assumption of goodness”—the feeling that “it’ll be OK”—had become a rationalization for skipping such tests under the pressure of dominant goals, such as schedules.

Similar testing to challenge any assumption of goodness was also skipped in the lead-up to the two shuttle disasters which also were commemorated with events last week: the anniversary of the destruction of Challenger and its seven-person crew is January 28, while the anniversary of the loss of Columbia, with seven more astronauts, is February 1. Consequently, awareness of potentially fatal flaws eluded the teams in charge of those missions, too.  

Most famously, the loss of Challenger was caused by assuming that flexible O-ring seals in the booster engines would seat properly at ignition even though the ambient temperature was lower than in the pre-flight testing range. Physicist Richard Feynman, a member of the investigation team, performed a simple experiment with a bucket of ice and a sample of the material to show that the assumption—which a shuttle team member had questioned just before launch—was not valid.

The “too late” test that could have prevented the breakup of Columbia was conducted several months after that disaster, under the leadership of investigation team scientist Scott Hubbard. A piece of fuel tank insulation foam had (as on earlier flights) been seen to tear off the tank early in the flight and impact under the left wing’s leading edge. Using a target of a flown thermal protection system panel and a high-velocity airgun, investigators fired the foam onto the panel at the same angle and speed as occurred during the Columbia foam impact, and tore a 50-centimeter hole in the target. Pre-flight impact testing had only used simulated grain-sized space debris, but never the kind of foam that—for years—had been observed tearing free from the tanks.  

Coming up with verification tests is fundamentally a challenge in operational engineering, but another panelist—Glynn Lunney, a flight director in mission control for the near-fatal Apollo 13 lunar mission and who later played important roles during the shuttle program—stressed that giving safety teams enough authority to demand such tests and object when they weren’t thorough enough was an organizational challenge. Whenever policy backing the authority of safety teams weakened, it laid the foundations for future imprudent decisions that led to new catastrophes. Though unable to attend due to illness, Frank Borman—the Gemini and Apollo astronaut who had been in charge of the Apollo 1 investigation and the bureaucratic reforms that followed—endorsed Lunney’s insights in a prerecorded set of answers to questions.

Borman demurred when asked whether schedule pressure was a factor in omitting certain tests,  affirming his belief that setting schedules was a constructive motivation to prioritizing problems to be solved. “You really have to manage time as a resource,” Lunney explained. “Big and small things come at you, prioritization of attention is what you have to be tuned into,” he added. Two decades later, after the Challenger was lost, the question of schedule-induced carelessness again came up, but rather than prioritizing problems, investigators found the pressure to fly was based on the need to impress Congress with the shuttle’s timeliness in order to convince them to use the shuttle for all satellite launches, rather than funding alternative rockets for military launches. 

Walt Cunningham, one of the astronauts on the Apollo 1 backup crew, admitted that the pilots were realistic about the possibilities of disasters. “We figured at some point we’d lose a crew, then learn from it and fix things and go on,” he told the hushed auditorium. NASA certainly did so as a consequence of Apollo 1, but as the symposium stressed, somehow it hadn’t figured out how to maintain the fixes in the organizational charts and in the minds of all of its workers, because periodically it had to relearn the same lessons at the same lamentable cost. Emotionally impactful events such as those held in memory of Apollo 1’s fallen astronauts may represent some of the best chances to avoid forgetting those lessons.


Tech Talk

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.

Newsletter Sign Up

Sign up for the Tech Alert newsletter and receive ground-breaking technology and science news from IEEE Spectrum every Thursday.