Could Supercomputing Turn to Signal Processors (Again)?

Illustration: Gluekit

Building high-performance computers used to be all about maximizing flops, or floating-point operations per second. But the engineers designing today’s high-performance systems are keeping a close eye not just on the number of flops but also on flops per watt. Judged by that energy-efficiency metric, some digital-signal processing (DSP) chips—the sophisticated signal conditioners that run our wireless networks, among other things—might make promising building blocks for future supercomputers, recent research suggests/

The DSPs that might make the jump to supercomputing come from Texas Instruments, which originally designed them for relatively modest applications. “We hadn’t even thought to look at high-performance computing or supercomputers,” says Arnon Friedmann, multicore business manager for TI. “It wasn’t on our radar.” TI’s DSP chips are typically used in embedded systems, most prominently cellular base stations. For such applications, power efficiency is vital, but until recently these systems didn’t require floating-point calculations, making do instead with just integer arithmetic. The advent of 4G cellular networks, however, increased the computing burden within base stations, making floating-point calculations essential.  

TI engineers added floating-point hardware to the TMS320C66 family of multicore DSPs late in 2010 without appreciably slowing these processors down or increasing the power consumed. But it was only after the new chips came out that some forward thinkers at TI realized that the eight-core C6678 DSP, which can perform as many as 12.8 gigaflops per watt running flat out, might be useful for general-purpose high-performance computing. 

“The question was whether we’d be able to extract that potential in the real world,” says Francisco D. Igual, now a postdoctoral researcher at Universidad Complutense de Madrid. He was working at the University of Texas at Austin with engineering professor Robert A. van de Geijn when TI approached them for help. Collaborating with TI, they wrote code for the new DSP to perform general matrix-matrix multiplication, something they felt would be representative of the kind of numerical weight lifting that high-performance systems are often asked to do. 

With that code in hand, the team compared the new DSP chip against some common supercomputer architectures. The DSP came out on top, at 7.4 gigaflops per watt. “We were happy, but we were not that surprised,” says Igual. 

But not everyone is swayed by those results. “It’s very impressive, but not a fair comparison,” says John Shalf of the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory, in California. Shalf points out that the comparison was done using single-precision (32-bit) arithmetic across the board. That’s because this DSP chip can complete a single-precision operation in one clock cycle, whereas double-precision calculations take four clock cycles and thus use four times as much energy. So the number-crunching circuits in each of the competing systems tested, which are configured for efficient double-precision arithmetic, were at a disadvantage. 

Texas Instruments hopes to improve the efficiency of double-precision operations on its multicore DSPs. But the energy used for double-precision calculations would, at the very least, still be double what researchers found in their single-precision tests, meaning these chips would at best be able to perform in the range of 3 to 4 gigaflops (double precision) per watt. 

And individual chips do not a supercomputer make, Shalf stresses. Combining many of them in a high-performance computer, with its various electronic subsystems and cooling apparatus, would significantly lower the machine’s overall energy efficiency.

How, then, would it compare with today’s best supercomputers? The current world champion in flops is the Sequoia supercomputer at Lawrence Livermore National Laboratory, in California. With its IBM BlueGene/Q architecture, it performed just over 2 gigaflops (double precision) per watt in recent benchmark tests—similar to what you’d expect from a double precision DSP-based machine, judging by the research results of Igual and his colleagues.

But perhaps this shouldn’t be so surprising, given BlueGene’s ancestry. Two decades ago, TI produced a line of DSPs that, like the company’s new multicore family, contained floating-point hardware. In 1998, physicists from Columbia University, in New York City, ganged thousands of them together to construct a special-purpose supercomputer for performing calculations in quantum chromodynamics, dubbing it QCDSP, for Quantum Chromodynamics on Digital Signal Processors. Texas Instruments later dropped that DSP line, but Alan Gara, one of the three physicists who pioneered the design of the QCDSP, retained some of its lessons when he moved to IBM, where he became the chief architect for the BlueGene supercomputers.

You might guess that this groundbreaking DSP-based supercomputer is now enshrined in a computer museum. But in July, when IEEE Spectrum caught up with Columbia physicist Robert Mawhinny, who worked alongside Gara on that project, he told us that it met a sadder fate: “We threw it out last week—literally.”

hardware supercomputer high performance computing supercomputers digital signal processor texas instruments dsp

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Could Supercomputing Turn to Signal Processors (Again)?

Texas team says digital signal processors could compete in high-performance computing

Will Dectravalve Transform EV Charging Speeds?

Advice on Leading and Mentoring For Greater Innovation

Tiny MEMS Clock Rivals Atomic Precision

Related Stories

Google’s Quantum Computer Exponentially Suppresses Errors

The Future of Deep Learning Is Photonic

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Could Supercomputing Turn to Signal Processors (Again)?

Texas team says digital signal processors could compete in high-performance computing

Will Dectravalve Transform EV Charging Speeds?

Advice on Leading and Mentoring For Greater Innovation

Tiny MEMS Clock Rivals Atomic Precision

Related Stories

Cloud Computing’s Coming Energy Crisis

Google’s Quantum Computer Exponentially Suppresses Errors

The Future of Deep Learning Is Photonic