What does the mortgage crisis have to do with microprocessor architecture? It turns out that calculating prices for those financially dubious mortgage-backed securities is a division-intensive process, and division has long been the weak link in a microprocessor’s arithmetic operations. With Intel’s new crop of 45â''nanometer processors, code-named Penryn, the company is making the first substantial upgrade in its processors’ divider since the original Pentium came out in 1993. The speedup doubles the number of bits calculated with each tick of the processor’s clock and will make a substantial difference to financial and scientific computing. And because Intel powers so much of the computer market, the development could tempt programmers to retreat from the less accurate but faster software tricks they’ve used as a substitute for division.
”Divide had become the long pole in the tent,” says Steve Fischer, the lead architect for Penryn. ”We tried at least to chop the pole in half. It’s still long compared to some functions. But it’s a lot better.”
The new divider is a variation on the old one, known as SRT Radix-4. SRT (for Sweeny, Robertson, and Tocher) is basically a soupedâ''up version of long division that generates two bits of the answer with each step. The new Radixâ''16 divider works fundamentally the same way but computes four bits in each step. Getting those other two bits was no simple task. ”I would say that the divider really pushed the edge in terms of the max clock frequency performance for Penryn,” says Fischer. Of all the processor’s new architectural tricks, the divider was most dependent on the chip’s 45-nm features and redesigned transistors [see ”The High-k Solution,” IEEE Spectrum, October 2007].
The new divider is a nod to the importance of scientific and financial calculations, which require precise manipulation of large floating-point numbers—a standard number format that includes a sign, an exponent, and a fraction, all in a 32-, 64-, or 80â''bit package. ”Sometimes software has avoided the use of the divide in [favor of] a look-up table or some approximation. Scientific work can’t rely on that,” says Fischer.
Peter Markstein, a retired computer-arithmetic expert who worked on the floating-point units for the Intel and HP Itanium architecture and the IBM Power architecture, thinks the new division rate might influence how software is written. ”People who use the Intel architecture will, I think, be more inclined to use division and not look for ways to avoid it,” he says. Because they’ll be taking fewer inexact shortcuts, computer simulations and other scientific programs could come up with better answers. (The Power and Itanium architectures do division with software that relies on a circuit called a fused multiply adder.)
Penryn’s floating-point divider pulls it ahead of the division scheme used in processors made by its main rival, Advanced Micro Devices, for 32-bit quotients. But Penryn only matches AMD’s divider for 64-bit numbers, according to Chuck Moore, chief engineer for AMD’s next generation of processors. Since its Athlon chip debuted in 1999, AMD has been using a technique called convergence. Unlike in SRT, which calculates bits of quotient at a steady pace, convergence operates at an accelerating pace, says Debjit Das Sarma, principal member of the technical staff at AMD. Though convergence takes more clock cycles than SRT Radix-16 to get to 32 bits, it takes fewer cycles to go from 32 bits to 64 and would take fewer still to go to 80 or beyond.
In future processor architectures such as Bulldozer, due out by 2010, AMD does not expect the number of clock cycles required to finish a floating-point division to change much. But the company is going for ”a substantial improvement” in the number of those divisions the processor can work on at once, says Moore.
Despite some dedicated effort by the two Silicon Valley rivals, ”nobody is satisfied with these division times,” says David Matula, a professor of computer science at Southern Methodist University, in Dallas, who has consulted for and competed against AMD in the past. He believes that makers of scientific software would be satisfied only if division took no more than twice as long as multiplication. Still, ”I’m glad Intel is in the game again,” he says.