New System Could Break Bottleneck in Microprocessors

Engineers at North Carolina State University and at Intel have come up with a solution to one of the modern microprocessor’s most persistent problems: communication between the processor’s many cores. Their answer is a dedicated set of logic circuits they call the Queue Management Device, or QMD. In simulations, integrating the QMD with the processor’s on-chip network, at a minimum, doubled core-to-core communication speed, and in some cases, boosted it much farther. Even better, as the number of cores was increased, the speed-up became more pronounced.

In the last decade, microprocessor designers started putting multiple copies of processor cores on a single die as a way to continue the rate of performance improvement computer makers had enjoyed without chip-killing hot spots forming on the CPU. But that solution comes with complications. For one, it meant that software programs had to be written so that work was divided among processor cores. The result: Sometimes different cores would need to work on the same data or have to coordinate the passing of data from one core to another.

“We have to improve performance by improving energy efficiency. The only way to do that is to move some software to hardware”

To prevent the cores from wantonly overwriting each other’s information, processing data out of order, or committing other errors, multicore processors use lock-protected software queues. These are data structures that coordinate the movement of and access to information according to software-defined rules. But all that extra software comes with significant overhead, which only gets worse as the number of cores increases. “Communications between cores is becoming a bottleneck,” says Yan Solihin, a professor of electrical and computer engineering who led the work at NC State.

The solution—born of a discussion with Intel engineers and executed by Solihin's student, Yipeng Wang, at NC State and at Intel—was to turn the software queue into hardware. This effectively turned three multistep software queue operations into three simple instructions—add data to the queue, take data from the queue, and put data near where it’s going to be needed next. Compared with just using the software solution, the QMD sped up a sample task such as packet processing—like network nodes do on the Internet—by a greater and greater amount the more cores were involved. For 16 cores, QMD worked 20 times as fast as the software could.

Once they realized this result, the engineers reasoned that the QMD might be able to do a few other tricks—turning more software into hardware. They added more logic to the QMD and found that it could speed up several other core communications-dependent functions, including MapReduce, a technology Google pioneered for distributing work to different cores and collecting the results.

They aren’t done yet. “The next step is to figure out other types of hardware accelerators that would be useful,” says Solihin. “We have to improve performance by improving energy efficiency. The only way to do that is to move some software to hardware. The challenge is to figure out which software is used frequently enough that we could justify implementing it in hardware. There is a sweet spot,” he says.

Intel engineer Ren Wang is presenting the QMD speed-up results at the 25th Annual Conference on Parallel Architectures and Compilation Techniques, in Haifa, Israel this week.

core-to-core communications hardware parallel computing mapreduce processors multicore processors software on-chip networks

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

New System Could Break Bottleneck in Microprocessors

Turning burdensome software into simple hardware speeds core-to-core communication in microprocessors

Smart Tile Monitors Crowds Without Power Source

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

Related Stories

Snapdragon X2: Qualcomm’s AI-Driven Processor Unveiled

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

New System Could Break Bottleneck in Microprocessors

Turning burdensome software into simple hardware speeds core-to-core communication in microprocessors

Smart Tile Monitors Crowds Without Power Source

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

Related Stories

Snapdragon X2: Qualcomm’s AI-Driven Processor Unveiled

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor