Breaking the Multicore Bottleneck

Researchers at North Carolina State University and at Intel have come up with a solution to one of the modern microprocessor’s most persistent problems: communication among the processor’s many cores. Their answer is a dedicated set of logic circuits they call the Queue Management Device, or QMD. In simulations, integrating the QMD with the processor’s on-chip network at a minimum doubled core-to-core communication speed and, in some cases, boosted it much further. Even better, as the number of cores was increased, the speedup became more pronounced.

In the last decade, microprocessor designers started putting multiple copies of processor cores on a single die as a way to continue the rate of performance improvement computer makers had enjoyed without causing chip-killing hot spots to form on the CPU. But that solution comes with complications. For one, it means that software programs have to be written so that work is divided among processor cores. The result: Sometimes different cores need to work on the same data or must coordinate the passing of data from one core to another.

"We have to improve performance by improving energy efficiency. The only way to do that is to move some software to hardware"

To prevent the cores from wantonly overwriting one another’s information, processing data out of order, or committing other errors, multicore processors use lock-protected software queues. These are data structures that coordinate the movement of and access to information according to software-defined rules. But all that extra software comes with significant overhead, which only gets worse as the number of cores increases. “Communications between cores is becoming a bottleneck,” says Yan Solihin, a professor of electrical and computer engineering who led the work at NC State, in Raleigh.

The solution—born of a discussion with Intel researchers and executed by Solihin’s student, Yipeng Wang, at Intel and at NC State—was to turn the software queue into hardware. This effectively turned three multistep software-queue operations into three simple instructions: Add data to the queue, take data from the queue, and put data close to where it’s going to be needed next. Compared with just using the software solution, the QMD sped up a sample task such as packet processing—like network nodes do on the Internet—by a greater and greater amount the more cores were involved. For 16 cores, QMD worked 20 times as fast as the software could.

Once they achieved this result, the researchers realized that the QMD might be able to do a few other tricks—such as turning more software into hardware. They added more logic to the QMD and found it could speed up several other core-communications-dependent functions, including MapReduce, a technology Google pioneered for distributing work to different cores and collecting the results.

Srini Devadas, an expert in cache control systems at MIT, says the QMD addresses “a very important problem.” Devadas’s own solution for the use of caches by multiple cores—or even multiple processors—is more radical than the QMD. Called Tardis [PDF], it’s a complete rewrite of the cache management rules, and so it is a solution aimed at processors and systems of processors further in the future. But QMD, Devadas says, has nearer-term potential. “It’s the kind of work that would motivate Intel—putting in a small piece of hardware for a significant improvement.”

The Intel researchers involved couldn’t comment on whether QMD would find its way into future processors. However, they are actively researching its potential. (Wang is now a research scientist at Intel.) The researchers hope that QMD, among other extensions of the concept, can simplify communication among the cores and the CPU’s input/output system.

Solihin, meanwhile, is inventing other types of hardware accelerators. “We have to improve performance by improving energy efficiency. The only way to do that is to move some software to hardware. The challenge is to figure out which software is used frequently enough that we could justify implementing it in hardware,” he says. “There is a sweet spot.”

core-to-core communications hardware parallel computing mapreduce processors multicore processors software on-chip networks

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Breaking the Multicore Bottleneck

Simple hardware speeds core-to-core communication

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

Snapdragon X2: Qualcomm’s AI-Driven Processor Unveiled

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Breaking the Multicore Bottleneck

Simple hardware speeds core-to-core communication

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

Snapdragon X2: Qualcomm’s AI-Driven Processor Unveiled

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor