Chinese Chip Wins Energy-Efficiency Crown

The Dawning 6000 supercomputer, which Chinese researchers expect to unveil in the third quarter of 2011, will have something quite different under its hood. Unlike its forerunners, which employed American-born chips, this machine will harness the country's homegrown high-end processor, the Godson-3B. With a peak frequency of 1.05 gigahertz, the Godson is slower than its competitors' wares, at least one of which operates at more than 5 GHz, but the chip still turns heads with its record-breaking energy efficiency. It can execute 128 billion floating-point operations per second using just 40 watts—double or more the performance per watt of competitors.

The Godson has an eccentric interconnect structure—for relaying messages among multiple processor cores—that also garners attention. While Intel and IBM are commercializing chips that will shuttle communications between cores merry-go-round style on a "ring interconnect," the Godson connects cores using a modified version of the gridlike interconnect system called a mesh network. The processor's designers, led by Weiwu Hu at the Chinese Academy of Sciences, in Beijing, seem to be placing their bets on a new kind of layout for future high-end computer processors.

A mesh design goes hand in hand with saving energy, says Matthew Mattina, chief architect at the San Jose, Calif.–based Tilera Corp., a chipmaker now shipping 36- and 64-core processors using on-chip mesh interconnects.

Imagine a ring interconnect as a traffic roundabout. Getting to some exits requires you to drive nearly around the entire circle. Traveling away from your destination before getting there, says Mattina, requires more transistor switching and therefore consumes more energy. A mesh network is more like a city's crisscrossed streets. "In a mesh, you always traverse the minimum amount of wire—you're never going the wrong way," he says.

On the 8-core Godson chip, 4 cores form a tightly bound unit—each core sits on a corner of a square of interconnects, as in a usual mesh. Godson researchers have also connected each corner to its opposite, using a pair of diagonal interconnects to form an X through the square's center. A "crossbar" interconnect then serves as an overpass, linking this 4-core neighborhood to a similar 4-core setup nearby.

Godson developers believe that their modified mesh's scalability will prove a key advantage, as chip designers cram more cores onto future chips. Yunji Chen, a Godson architect, says that competitors' ring interconnects may have trouble squeezing in more than 32 cores.

Indeed, one of the ring's benefits could prove its future liability. Linking new cores to a ring is fairly easy, says K.C. Smith, an emeritus professor of electrical and computer engineering at the University of Toronto. After all, there's only one path to send information—or two in a bidirectional ring. But sharing a common communication path also means that each additional core adds to the length of wire that messages must travel and increases the demand for that path. With a large number of cores, "the timing around this ring just gets out of hand," Smith says. "You can't get service when you need it."

Of course, adding more cores in a mesh also stresses the system. Even if you have a grid of paths providing multiple communication channels, more cores increase the demand for the network, and more demand makes traveling long distances difficult: Try driving across New York City at rush hour. Still, the bandwidth scaling of a mesh interconnect is superior to that of a ring, Tilera's Mattina says. He notes that the total bandwidth available with a mesh interconnect increases as you add cores, but with a ring interconnect, the total bandwidth remains constant even as the core count increases. Latency—the time it takes to get a message from one core to another—is also more favorable in a mesh design, Chen says. In a ring interconnect, latency increases linearly with the core count, he says, while in a mesh design it increases with the square root of the number of cores.

Reid Riedlinger, a principal engineer at Intel, points out that a ring interconnect has its own scalability benefits. Intel's recently unveiled 8-core Poulson design employs a ring not only to add more cores but also to add easy-to-access on-chip memory, or cache. As long as the chip has the power and the space, Riedlinger says, a ring makes it easy to add each core and cache as a module—a move that would require more complicated validity studies and logic modification in a mesh. "Adding the additional ring stop has a very small impact on latency, and the additional cache capacity will provide performance benefits for many applications," he says.

For those who are not building a national supercomputer, Riedlinger also points out that a ring setup is more easily scalable in a different direction. "You might start with an 8-core design," he says, "and then, to suit a different market segment, you might chop 4 cores out of the middle and sell it as a different product."

This article originally appeared in print as "China's Godson Gamble".

processors ring manycore interconnect mesh multicore

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Chinese Chip Wins Energy-Efficiency Crown

Though slower than competitors, the energy-saving Godson-3B is destined for the next Chinese supercomputer

Related Stories

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Intel Unveils Big Processor Architecture Changes

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes:

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Chinese Chip Wins Energy-Efficiency Crown

Though slower than competitors, the energy-saving Godson-3B is destined for the next Chinese supercomputer

Related Stories

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Intel Unveils Big Processor Architecture Changes

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes: