16 February 2012—The failure of Russia’s ambitious Phobos-Grunt sample-return probe has been shrouded in confusion and mystery, from the first inklings that something had gone wrong after its 9 November launch all the way to inconsistent reports of where it fell to Earth on 15 January.
What was never mysterious was how important the mission goal was—to land a probe on the Martian moon Phobos and then return soil samples to Earth. It was to have been the flagship mission that vaulted Russia back into prominence in interplanetary exploration after a quarter century of disappointment and delay, but it quickly turned into a heartbreaking debacle. On the heels of a woeful parade of other space failures, the mission cast an ominous shadow over the entire Russian space industry.
The release of the official accident investigation results on 3 February served only to further rumors of fundamental hardware and software design flaws, and of blatant violations of safety standards. The report blames the loss of the probe on memory chips that became fatally damaged by cosmic rays. The probe died so suddenly that it didn’t even send an error message, but investigators concluded the only plausible failure mechanism was the simultaneous disabling of two identical chips in the dual-computer control system, causing both to restart simultaneously. This in turn led to the autopilot going into “safe mode” while maintaining the spacecraft’s orientation to the sun. (That reorientation was observed in the ensuing days as thruster firings disturbed the probe’s orbit.)
Phobos-Grunt was supposed to await further instructions from Earth, but it never received them; in an incredible design oversight, the probe could receive emergency instructions only after a successful departure from parking orbit.
Section 2.3 of the report provides insight into where the computer malfunction that doomed the probe came from: “The most likely factor which caused a ‘double restart’ was a local influence of heavy charged particles from space.” Known as galactic cosmic rays, these particles are the nuclei of heavy atoms moving at near light speed after being spit from the hearts of supernovas. Earth’s magnetosphere and atmosphere provide protection from such radiation at the planet’s surface.
Press reports suggest that investigators thought the chip failures were a result of counterfeit components—lesser circuits labeled with higher performance qualities than they actually had. But the final report does not mention this possibility. Vladimir Popovkin, head of Roskosmos, the Russian space agency, was careful to say in interviews (such as on the radio show “Echo of Moscow” on 2 February) that although chip counterfeiting was a widespread problem, “we cannot say that the chips there were counterfeit.”
The radiation environment of outer space can certainly be hazardous to space vehicles. To assess the credibility of the Russian conclusions, IEEE Spectrum contacted Steven McClure of NASA’s Jet Propulsion Laboratory (JPL), in Pasadena, Calif. McClure is the supervisor of the Radiation Effects Group, which is NASA’s first line of defense against the threat that Roskosmos says the probe fell victim to.
At Spectrum’ s request, McClure read a translation of the official Russian report. He immediately recognized the specific component identified in the report as the likely locus of the double-hardware failure—the WS512K32, which is a single-package assembly of SRAM totaling 512 kilobytes. There are probably four chips in this bi-32 device,” he explains. “They were identified in a report by Joe Benedetto [an industry specialist in radiation hardening] a few years ago as some of the most sensitive parts to single-event latch-up they had ever seen.” Single-event latch-up occurs when a charged particle passing through a semiconductor causes a high current to flow through it. Generally, the device will be stuck in that state until the chip’s power is cut off and turned back on again, but in some cases, the chip may be permanently damaged.
The WS512K32 is “sold on the aircraft market to a military grade—not the space market,” says McClure. He points out that neither the original fabricators nor the commercial vendors test for radiation, and they would not give radiation specs. If this chip had been proposed for a critical component in a space-probe design at JPL, he assured Spectrum, “it would not likely be approved for use.” McClure says that it would be okay for a space mission of a couple of days or for noncritical applications but not for a years-long mission to Mars and back, which would typically “require a probability of failure of less than 1:10 000 [for the] entire mission.”