Beat the Heat
Of all the issues facing chip and computer designers, none is more burning than the soaring levels of power flowing through integrated circuits
A little over a year ago, a Swedish scientist learned the hard way that laptop computers do not quite live up to their name. According to the British medical journal The Lancet, the mercifully anonymous man spent an evening writing a report, periodically shifting position to avoid heat from the machine. The next day he woke to find himself blistered in a very sensitive place. He'd been well and truly fried.
Anyone old enough to associate the word "computer" with 1950s-era images of the original UNIVAC, with its 5200 tubes cooled by water drawn from a river, probably won't be shocked by the news that a computer could inflict a second-degree burn. Indeed, the fabled machine once failed spectacularly when a wayward fish obstructed the water's flow. Nevertheless, engineers, lulled by the ubiquitous hum of their workstations' fans, can be forgiven for thinking that the heat thrown off by a computer's innards is no longer a burning issue.
But it is. Chip designers, computer makers, assorted university researchers, and chip-packaging specialists are uniting to tackle one of the most urgent, but overlooked, of the litany of potential showstoppers looming for the global semiconductor industry: the soaring densities of heat on integrated circuits, particularly high-performance microprocessors. Researchers are studying exotic new kinds of heat-conducting "goop" that suck the heat out of a chip and convey it to heat sinks, which radiate it into the air. Still, it is a measure of the seriousness of the problem that engineers are also pursuing concepts that have been considered too elaborate and far too expensive for such a mass-produced consumer product as a personal computer. Possibilities on the horizon include tiny, self-contained evaporative cooling systems and even devices that capture the heat and turn it directly into electricity.
What has led researchers to such measures? Basic physics: virtually all the power that flows into a chip comes out of it as waste heat. Today's standard-issue Pentium 4 throws off 100watts, the same as the bulb in a child's Easy-Bake Oven and, as the hapless Swede learned, more than enough to cook meat. Divide by the area and you get a heat flux of around 30watts per square centimeter--a power density several times higher than that of a kitchen hot plate.
Addressing engineers at the 2001 IEEE International Solid State Circuits Conference, Patrick P. Gelsinger, the chief technology officer at Intel Corp., Santa Clara, Calif., said that if the trend toward ever more fiery chips were to continue unchecked, and surely it will not, microprocessors a decade from now will be pouring out as much power as the surface of the sun, some 10000W/cm2. "We need a fresh approach," Gelsinger concluded.
Heat Hurts Performance because transistors run faster when they're cool rather than hot. That's why power-mad "overclockers," in search of an additional 20–30 percent of switching speed, clap custom heat sinks and cryogenic refrigeration systems onto the microprocessors in their souped-up PCs. Heat, or rather repeated cycling from hot to cool, also shortens the life of the chip. One way it does this is by inducing mechanical stress that can literally tear a chip apart. "Typically, it's not the silicon but the package that fails," says Avram Bar-Cohen, an IEEE fellow and professor of mechanical engineering at the University of Maryland in College Park. But the silicon suffers, too. Hot copper and aluminum interconnects on the chip are also more susceptible to disintegration in a phenomenon called electromigration, a serious reliability issue.
Supercomputer designers think nothing of adding chilled-water cooling and other refinements to their systems, but mass-market manufacturers have so far been unwilling to pay for such things. Garden-variety desktop computers today come with cooling equipment worth just US $3 to $5--basically a fan and a heat sink.
Engineers have a way to go yet before they exhaust the possibilities of fans. It is amazing to see how far air-cooling has come; indeed, you might say it's one of the computer industry's most successful kludges. "We can always put in a bigger fan--we are able to push the envelope further and further with air cooling," says Koneru Ramakrishna, a thermal engineer at Motorola Inc., Austin, Texas, and chair of the big thermal management conference ITherm, to be held next month in Las Vegas. "There is an end to it, but how far away it is, I don't know."
An extreme example is provided by one of the prime customers for Motorola's processors: Apple Computer Inc., Cupertino, Calif. Its top-of-the-line Power Mac G5 packs an incredible nine fans, in four separate cooling zones--for the processor, the peripheral component interconnect (PCI) cards, the storage, and the power supply. The operation in each zone is fine-tuned with the help of feedback from 21 temperature sensors [see photo, "Wind Tunnel"]. The tuning keeps noise to a minimum, and indeed, noise is perhaps the main drawback to fan cooling. It was for quiet, rather than coolness, that Hitachi Ltd., Tokyo, introduced its high-end water-cooled notebook computer in Japan two years ago. And researchers at Purdue University, West Lafayette, Ind., are developing a silent air-cooling technology with no moving parts. It relies on ionizing air, which it drags across the chip using electric fields.
Computing has coasted on the fan and heat sink for quite some time. Indeed, for many in the electronics industry during much of the last decade, there was little urgency in the quest for new thermal management technology. That was thanks to the switch, in the 1980s, from ICs built using bipolar transistors to chips using today's technology, CMOS.
CMOS set the clock back on the heat problem because,unlike transistors in bipolar technology, CMOS transistors draw power only when they switch from one state to another. "But by the late 1990s, we got to the same power dissipation levels we'd had with bipolars," says Bar-Cohen. "We had a 10-year free ride, using the technology we'd developed before. Now we need new ideas."
Perhaps the Biggest Bottleneck in air-cooling technology is getting the heat from the chip to the heat sink. Blocking the flow of heat are the interface between the chip itself and the lid of the chip package, if there is one, and the interface between the lid and the heat sink [see illustration, " Hot Stuff"]. Merely pressing the heat sink against the package lid won't do the trick, because microscopic roughness on both components makes for a joint full of air pockets, highly resistant to the flow of heat.
Historically, a common solution has been to fill one or both interfaces with solder, which is what the makers of power electronics systems still do. But this solution is not without its drawbacks. For one, you can't break a soldered connection without breaking the chip, which makes prototyping difficult. Even more troubling, a hard connection is liable to fail after a few thousand cycles of heat-induced expansion and contraction.
That's why most manufacturers resort to a thin layer of grease or "goop"--shorthand for thermal paste--as an interface that is soft enough to withstand expansion and contraction. Thermal paste consists of a bonding agent, say, mineral oil or epoxy, and a filler, such as silica or some more exotic substance. The filler does most of the job of conducting heat; the bonding agent holds it together and ensures that no microscopic air gaps remain between the chip and the lid or the lid and the heat sink. The problem is that the more filler you add to improve conductivity, the thicker the goop becomes, making it unable to fill all the gaps.
To improve the heat conductivity of goop by a factor of 10, in late 2002 the U.S. National Institute of Standards and Technology, in Gaithersburg, Md., put up half the money for a three-year, $7.2million joint venture under the main sponsorship of General Electric Co., Fairfield, Conn. The focus is on fillers made of nanoscale structures, able to reach into all the crannies, says GE's Sandeep Tonapi, who declined to name his secret nanosauce.
Meanwhile, an independent researcher who hadn't even heard of the federal project may just have found a solution that uses a decidedly unexotic nanofiller: soot. Deborah D.L. Chung, a materials engineer at the State University of New York, Buffalo, takes plain polyethylene glycol, a water-soluble emulsifier used in everything from toothpaste to printing inks, and fills it with carbon black--superfine soot, that is. This makes a thin paste that she says conducts heat 50 percent better than tin-based solders do and a lot better than any existing brand of goop does.
"Everybody assumes that to get good paste, one must make its internal heat conductivity as high as possible, but I found out that what's important is spreadability," she says. "You can't use just any soot. You need the right grade--30-nanometer particles that form agglomerations that look like clusters of grapes and are squishable, so that they flatten under compression to conform well to the surface topography" of the chip and heat sink.
Experts in thermal dissipation agreed that Chung's discovery sounded interesting, but they cautioned that it is one thing to demonstrate great heat transfer between metal plates in a laboratory and another to prove that it can work in real products. Many also stressed that no magic material could solve all the industry's heat-related problems. "A 50 percent improvement over solder would definitely help, but not so much that I can drop what I'm doing," mopes Motorola's Ramakrishna. "No one factor makes the difference--you get a 10 percent improvement here, 3 percent there."
Still, he acknowledges, even a 3 percent overall improvement in cooling efficiency would increase the life span of the chip significantly. "That may not matter much for a cellphone, which the customer replaces every few years, but it certainly does for automotive electronics, which are supposed to last for 20 years," Ramakrishna says.
The Bane Of The Thermal Engineer is the cost of cooling. Designers of laptops and PCs are under extreme pressure to keep costs down and are unwilling to spring for much more than a heat sink and a fan. But you don't hear the supercomputer guys complaining about the heat--their customers are happy to pay for exotic technologies. Last year Hewlett-Packard Co., in Palo Alto, Calif., building on its expertise in inkjet printing, showed off a system to spray liquid onto processors, so that evaporation could carry away far more heat than mere convection can.
Another form of this evaporative cooling was implemented two years ago by Cray Inc., Seattle, in its X1 supercomputer, and today it is used in the SV2 model, as well. The system, from Parker Hannifin Corp., Cleveland, the main supplier of jet-fuel delivery systems to the aviation industry, sprays a fluorocarbon fluid, made by 3M, St. Paul, Minn., that has a boiling point of 56 °C. "As the microscopic droplets boil off, the bubbles create nucleation points" for more bubbles to form, says Greg Pautsch, a thermal-packaging engineer at Cray. Result: even faster boiling, letting the system sweat off 45 W/cm2. Cray recently settled an intellectual property tiff over the technology with Isothermal Systems Research, Spokane, Wash.
How might this large system be mass-produced for use in PCs? Cristina Amon, director of the Institute for Complex Engineered Systems at Carnegie Mellon University, in Pittsburgh, is working on a miniaturized evaporative system that she hopes can eventually be produced for $20–$30 per machine (compared with the $5 or so it costs to air-cool today's standard-issue PCs). Her project, funded by the U.S. Defense Advanced Research Projects Agency (DARPA), uses microelectromechanical systems (MEMS) fabrication techniques to fashion a plate not much bigger than the chip itself but employing many tiny spray guns that bond directly to the chip [see illustration, "Chip-Scale Squirt Gun"].
Each nozzle shoots 100-micrometer droplets of a fluorine-based dielectric fluid at the chip's hot spots, metering the flow according to the temperature inferred from the switching speed of local transistors. The liquid boils, carrying off a big dollop of energy in the gas, which flows to a condenser. The condensate is then pumped back to the spray nozzles by a micropump.
"We are removing 300 to 400 watts per square centimeter with our current prototype, all locally, on the chip," says Amon, an IEEE fellow. "But if you spread the heat a bit with conducting plates, you can easily double the amount." That would mean dissipating as much heat as even the high-performance chips are expected to produce in the foreseeable future.
Best of all, as a cooling system, the technology is self-governing, working especially well precisely where it is needed most. That's because a dielectric fluid with the proper boiling point provides cooling at just the right temperature, and because it boils off faster in the hotter areas, reducing the temperature differential across the chip. Such differentials cause some parts of the chip to expand more than others, pulling the circuitry apart at the seams. Moreover, surface tension tends to suck liquid to the hotter, faster-drying parts.
Amon's system would, however, require some basic rethinking. For one thing, to preserve the coolant, the package must be hermetically sealed. The slightest leak would cause the remaining coolant to boil off even faster, and the chip would fail catastrophically. For another, the nozzle array would have to be designed concurrently with the chip, both to ensure that the chip's hot spots are spread out and to optimize the control of each nozzle.
The system, together with other heat-conducting concepts, was backed by DARPA in part because the military wants wearable computers that won't get fouled by mud or dust, as they would if they depended on a fan. And what's good for your PC may be good for you, too, someday: a few of the concepts DARPA is studying may even pave the way to air-conditioned uniforms for desert commandos or urban firefighters, and air-conditioned clothing for hot, cranky city dwellers.
While some researchers focus on siphoning heat from chips, another group is constantly striving to minimize it in the first place. The biggest such improvement was the switch to CMOS from bipolar transistors in the late 1980s, and another big switch may soon be in the offing. In the last couple of years, major chip makers have been working on materials, such as halfnium dioxide, called high- k dielectrics. This class of materials saves power by essentially eliminating "vertical leakage"--that is, the seepage of current through the insulating layer on a transistor's gate, the part that turns it on and off.
The reason for such leakage is that as transistors shrink, the insulator--until now, silicon dioxide--has had to slim down, too, in order to maintain its electrical performance. But now it is only a few atomic layers thick. At those dimensions, there's no way to keep charge in the gate from tunneling through the insulator and, as a result, power goes to waste.
According to Intel, without a high- k dielectric, chips made just three to five years from now would be throwing off 200 W. But a thicker, leakproof layer of high- k dielectric can do the same job in the transistor as the leaky sliver of silicon dioxide, and it will cut the power dissipation in half while allowing for faster-switching transistors. But even that innovation does no more than buy a few years' time on the semiconductor industry road map.
Another way to reduce net power consumption would be to scavenge a bit of energy from the temperature differentials around a chip. It may sound far-fetched, but workers at the University of Maryland, in a project underwritten by Sony Corp., Tokyo, recently demonstrated a solid-state thermoelectric generator that sits under the chip, drawing off perhaps half a watt of power. These tiny generators are essentially thermocouples, two different materials joined together at the hot chip and at the comparatively cool heat sink. The materials draw some of the heat away from the chip and use it to drive a current through them.
The upshot is that for a typical high-performance microprocessor, there would be half a watt less waste heat--and half a watt more to run a tiny fan (or liquid pump). "But you'd have to do a global optimization, matching everything perfectly, to get this system to work," says Maryland's Bar-Cohen. In other words, designers would have to know a great deal about how hot the chips get during certain operations and how fast that heat would flow out of the chip under the influence of the heat sink, the generator, and the fan. Heat specialists, though, still lack the computerized design tools they need to model the problem.
A truly global approach, says Bar-Cohen, would examine the ultimate cost of cooling. You might start with the aluminum in a typical heat sink, which takes nearly a kilowatt of electricity to smelt, and then move to the energy used by a computer's multiple fans. Throw in the cost of air-conditioning the office building--a megawatt here and a megawatt there--and pretty soon you're talking real energy.
And nowhere is that energy more obvious than in the racks and racks of servers in large data centers. Chandrakant Patel, a distinguished technologist at Hewlett-Packard Laboratories, calculates that future big data centers, 1000 racks or larger, might need 10 MW to run the computers and a further 5 MW just to keep them cool enough to operate. He and his colleagues at H-P were awarded a patent last summer on technology that could make a sizable dent in that amount, saving as much as 25 percent on cooling costs.
H-P's system, a thermal load balancer, allocates compute workloads to racks with the most power-efficient hardware for the given workload and then directs the needed flow of cool air to just those racks by opening air-conditioning vents beneath them and turning on fans. Systems not in use are put on standby or shut down. Expanding on the idea, even greater savings might be possible if computing jobs were shifted around the world to take advantage of lower outside temperatures at night. And ultimately, says Patel, H-P hopes to manage computing power resources down to the level of the individual microprocessor.
It all boils down to cost, which is far harder to gauge than for other aspects of the computer industry, says Richard C. Chu, an IBM fellow in Poughkeepsie, N.Y. "Each company estimates cost differently, and I couldn't get the numbers for cooling technologies even within IBM--it's not a scientific number but one based on the market. Sometimes people give you the impression they know cost, but it's usually hot air."
Cosponsored by the IEEE, the premier conference for thermal management of electronics, ITherm, will be held in Las Vegas on 1–4 June [https://www.itherm.org/]. The best papers from the biennial conference are usually compiled in an issue of IEEE Transactions on Components and Packaging Technologies. Select papers from ITherm 2002 are in the March 2003 issue.