Today’s chips are marvels of mass production. Tens of thousands of silicon wafers can move through a single fab in a month, each one carted from tool to tool, blasted by heat, bombarded by ions, immersed in vapor, coated with chemicals, hit with radiation, and exposed to acid. It can take months and hundreds of steps to transform a plain piece of silicon into an array of chips. But at the end of this elaborate assembly line, chipmakers finally get a pile of identical devices that perform just as their designers intended.
At least that’s how it used to be. Then, about 10 years ago, chipmakers began to notice a problem: Even state-of-the-art manufacturing processes couldn’t produce chips with consistent properties. Nowadays two transistors, fabricated a few dozen nanometers apart on the same piece of silicon, will not have the same electrical properties. It’s one of the key barriers that the global chip industry—with sales of US $300 billion—must overcome to keep producing better, faster, cheaper, more energy-efficient chips.
The culprit is scaling. Chips have improved because their transistors and connecting wires have kept getting smaller, but now they’re so small that random differences in the placement of an atom can have a big impact on electrical properties. Some batches vary so much that more than half will run 30 percent slower than intended or consume 10 times as much power as they should when on standby.
Some of these defective chips can be sold at a discount, but if they’re for application-specific designs—say, for mobile phone communication or video encoding—they might find no better destination than the junkyard. And the defect rate will only get worse as transistors continue to shrink.
Chip variability is what the International Technology Roadmap for Semiconductors calls a “red brick” problem: one of a handful of important issues that lack any clear solution, forming a red brick wall that prevents forward progress.
But just because variability is here to stay doesn’t mean we can’t mitigate its effects. We could accomplish much by changing the way we design chips. This has traditionally been done by introducing a margin of error to account for the worst-case scenario. Now that ridiculously small defects have entered the mix, that approach no longer works, and chipmakers must overcompensate for the problem. The result is pessimistically designed chips that operate far slower and consume far more energy than they should.
Fortunately, a new family of design techniques promises to predict not only the worst-case scenario for a chip but also the likelihood that the scenario will happen. These approaches use statistical methods to make informed trade-offs between how fast the chips will run and how many good chips a given batch is likely to yield. Some makers of high-end microprocessors like IBM and foundries like the Taiwan Semiconductor Manufacturing Co. are already using some of these statistical techniques in their design flows. Although statistical tools are still far from being widely adopted, if we can push them along, these tools will help us make affordable chips that are as fast and efficient as those the semiconductor road map calls for—and perhaps then some.
Comments