The June 2024 issue of IEEE Spectrum is here!

Close bar

Intel Makes A Big Jump In Computer Math

Long the ugly stepchild of computer arithmetic, division is getting a much needed makeover

3 min read

Samuel K. Moore is IEEE Spectrum’s semiconductor editor.

What does the mortgage ­crisis have to do with microprocessor ­architecture? It turns out that ­calculating prices for those ­financially dubious ­mortgage-backed securities is a division-­intensive process, and ­division has long been the weak link in a micro­processor’s ­arithmetic ­operations. With Intel’s new crop of 45â''­nanometer ­processors, code-named Penryn, the company is making the first substantial upgrade in its ­processors’ divider since the ­original Pentium came out in 1993. The speedup doubles the number of bits ­calculated with each tick of the ­processor’s clock and will make a substantial ­difference to financial and scientific computing. And because Intel powers so much of the computer ­market, the ­development could tempt programmers to retreat from the less ­accurate but faster software tricks they’ve used as a ­substitute for division.

”Divide had become the long pole in the tent,” says Steve Fischer, the lead architect for Penryn. ”We tried at least to chop the pole in half. It’s still long compared to some functions. But it’s a lot better.”

The new divider is a varia­tion on the old one, known as SRT Radix-4. SRT (for Sweeny, Robertson, and Tocher) is basically a soupedâ''up version of long division that generates two bits of the answer with each step. The new Radixâ''16 divider works fundamentally the same way but ­computes four bits in each step. Getting those other two bits was no simple task. ”I would say that the divider really pushed the edge in terms of the max clock ­frequency performance for Penryn,” says Fischer. Of all the processor’s new architectural tricks, the divider was most dependent on the chip’s 45-nm features and redesigned transistors [see ”The High-k Solution,” IEEE Spectrum, October 2007].

The new divider is a nod to the importance of ­scientific and financial ­calculations, which require precise ­manipulation of large floating-point ­numbers—a standard ­number format that includes a sign, an ­exponent, and a fraction, all in a 32-, 64-, or 80â''bit ­package. ”Sometimes ­software has avoided the use of the divide in [favor of] a look-up table or some approximation. Scientific work can’t rely on that,” says Fischer.

Peter Markstein, a retired computer-­arithmetic expert who worked on the floating-point units for the Intel and HP Itanium ­architecture and the IBM Power ­architecture, thinks the new division rate might influence how ­software is written. ”People who use the Intel ­architecture will, I think, be more inclined to use division and not look for ways to avoid it,” he says. Because they’ll be ­taking fewer inexact ­shortcuts, computer simulations and other scientific programs could come up with ­better answers. (The Power and Itanium architectures do division with software that relies on a circuit called a fused multiply adder.)

Penryn’s floating-point divider pulls it ahead of the division scheme used in processors made by its main rival, Advanced Micro Devices, for 32-bit ­quotients. But Penryn only matches AMD’s divider for 64-bit numbers, according to Chuck Moore, chief ­engineer for AMD’s next generation of ­processors. Since its Athlon chip debuted in 1999, AMD has been using a technique called convergence. Unlike in SRT, which calculates bits of quotient at a steady pace, convergence operates at an accelerating pace, says Debjit Das Sarma, principal member of the technical staff at AMD. Though convergence takes more clock cycles than SRT Radix-16 to get to 32 bits, it takes fewer cycles to go from 32 bits to 64 and would take fewer still to go to 80 or beyond.

In future ­processor architectures such as Bulldozer, due out by 2010, AMD does not expect the number of clock cycles required to finish a ­floating-point division to change much. But the company is going for ”a ­substantial improvement” in the ­number of those divisions the processor can work on at once, says Moore.

Despite some dedicated effort by the two Silicon Valley rivals, ”nobody is ­satisfied with these ­division times,” says David Matula, a professor of ­computer ­science at Southern Methodist University, in Dallas, who has consulted for and competed against AMD in the past. He believes that makers of scientific ­software would be satisfied only if division took no more than twice as long as multi­plication. Still, ”I’m glad Intel is in the game again,” he says.

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions