China’s Homegrown Supercomputers

In 2012, China’s chips will power the Dawning 6000

5 min read
Photo of The Sunway BlueLight supercomputer.
Great Wall: The Sunway BlueLight supercomputer is the first deployed using processors designed in China.
Photo: Xu Suhui/Xinhua/Landov

In late October 2011, the Sunway BlueLight MPP made headlines as China’s first high-performance computer to harness the power of a homegrown chip, the ShenWei SW1600. And the Dawning 6000, scheduled to come on line in December 2011, will use another indigenous processor, the Godson-3B. These are supercomputers that China can truly call its own.

At press time, engineers were busy optimizing the Dawning 6000’s ability to run Linpack—the benchmark software library used to rank computers in the Top500 list. The Sunway BlueLight, a petaflops-level supercomputer, has already been put through its Linpack paces and claimed 14th place in the November Top500 ranking. But don’t be fooled: Neither machine is a speed demon. Consider them rather as steps toward technological independence.

graphic link to special report

“The Dawning 6000 is really trying to master the tricks of this domain so that the Chinese have the ability to develop their own chips, their own IT from the ground up,” says IEEE Fellow Tarek El-Ghazawi, a professor of electrical and computer engineering at George Washington University and a codirector of the NSF Center for High-Performance Reconfigurable Computing. Given that today’s exotic supercomputer components are tomorrow’s quotidian hardware for personal computers, El-Ghazawi predicts that these research projects will prove a boon for Chinese commercial chips, which he expects to become widespread in China’s marketplace in around 10 years. “Then, in the next 20 years,” he says, “they may be selling chips to the world, including the U.S.”

That would be a swapping of roles. The Dawning 5000 line, released in 2008, relied on U.S.-made AMD Opteron CPUs. And even China’s Tianhe-1A, for a few months the world’s top-ranked supercomputer, owed a good part of its 2.57-petaflops performance to Western chips—a total of 7168 Nvidia Tesla GPUs complemented by 14 396 Intel Xeon CPUs.

“The Tianhe was opportunistic,” El-Ghazawi says. “They looked at the top-performing chips out there and applied them. With Dawning, from the ground up, they are building a machine with careful consideration to each level of the architecture—chip, node, and system—with the requirements of the software in the back of their minds.”

The Tianhe-1A did not sacrifice all innovation in the race for the top. The machine was also celebrated for its indigenous interconnect system, the channels for shuttling information between computer nodes. The interconnect system, called Arch, was developed by China’s National University of Defense Technology. Capable of 160 gigabytes per second, Arch had greater bandwidth than commercially available alternatives, such as InfiniBand.

“If you’re developing your own supercomputer, you would have to build both your own processors and interconnect to connect them together,” says Jack Dongarra, a professor of electrical engineering and computer science at the University of Tennessee who helps to compile the Top500 ranking. “I would guess that the Chinese would want to move toward a system that they have developed themselves....They want to be in a position where they can develop an industry that can generate computers for China and the rest of the world rather than relying on Western components.”

Different supercomputers represent different strategies, argues David K. Kahaner, founding director of the Asian Technology Information Program, headquartered in Albuquerque. For example, the Tianhe-1A, still the fastest machine in China, and the Sunway BlueLight have roots in defense research, while the Dawning 6000 might be thought of as an academic research supercomputer. “China is a big country with a tremendous number of capable people, and they are striking out in a number of directions,” Kahaner says. “Competition is good for everybody.” Right now, he sees the BlueLight as the most indigenous, noting its use of both homegrown chips and a unique water-cooling system.

homespun

Photo: Xu Suhui/Xinhua/Landov
Homespun Logic: The ShenWei SW1600, which was designed in China, powers the Sunway BlueLight supercomputer.

None of the machines has completely broken away from Western influences. For example, the Dawning 6000’s Godson-3B processors, from the Chinese Academy of Sciences, appear to use a Western instruction set—the CPU’s overarching architecture as it relates to programming. The MIPS instruction set, which Godson uses, is more commonly found in the microprocessors inhabiting television set-top boxes. Mark Pittman, MIPS Technologies’ vice president for Asia and Pacific sales, says the Chinese Academy of Sciences was one of the first groups to target the MIPS instruction set for high-performance computing. Early in the project, the Godson researchers used the instruction set without a license, but Pittman notes that this issue has been resolved, adding that since 2009 his company has directly licensed MIPS to the academy’s Institute of Computing Technology.

“Leading researchers in China feel it is better to innovate a microprocessor with an existing instruction set rather than take a long time to develop an instruction set and then innovate a processor based on that,” Pittman says. Developing a new instruction set would require porting existing operating systems, programs, and drivers. “All the things that run on MIPS would have to be re-created for a new instruction set, and even in China that’s an expensive proposition.”

The BlueLight’s ShenWei SW1600, made by the National Research Center of Parallel Computer Engineering and Technology, is rumored to have taken one step further toward independence. Though earlier ShenWei chips used a modified Alpha instruction set, designers claim this processor uses an architecture of their own design, Kahaner says.

Both machines may also still use Western interconnect systems. Kahaner confirms that the BlueLight uses a modified InfiniBand interconnect system. El-Ghazawi says a prototype node of the Dawning 6000 also used a modified InfiniBand network, combining it with a specialized network for performing frequent tasks more efficiently. “In the end, we may see some new, efficient kind of Chinese switching that adheres to the InfiniBand standards,” he says, adding that the Dawning’s interconnect system is “very forward looking.”

The designers of the processors were also prescient in the stress they’ve put on conserving power. “The race to exaflop computing will be a race to energy efficiency,” says Steve Scott, now chief technology officer of Nvidia’s Tesla unit and formerly a senior vice president of Cray, a pioneering supercomputer company.

Ch.Supercomp

The Godson-3B, capable of 128 billion flops using just 40 watts, claims almost double the peak power efficiency of some U.S. competitors. At press time, however, the Godson’s energy efficiency had yet to be tested using a standard benchmark like Linpack. And, Dongarra points out, the processor is only one part of an energy budget that also includes interconnects and memory. The BlueLight’s complete system also turned heads with its efficiency. The system can perform 741 megaflops per watt, compared to 636 Mflops/W for the Tianhe-1A, Dongarra says.

Researchers at the Chinese Academy of Sciences are already upping the efficiency of the next class of processors, the Godson-3C. According to one of the chip’s architects, Yunji Chen, the 3C will have an even higher performance-to-power ratio, mainly because the processor will be built using a 32-nanometer fabrication process as opposed to the 3B’s older, 65-nm process and because the new chip will feature an improved three-level cache memory.

Even more energy could be saved by moving from CPUs to GPUs, as other high-performance computers have done. Such graphics processing chips—now used in general computing—can do simple operations on great gobs of data in parallel, rather than in a more one-at-a-time fashion as a CPU core does. That parallelism economizes on energy. Nvidia’s Scott says that an Intel Westmere CPU takes about 1.7 nanojoules per operation at peak performance, while an Nvidia Fermi GPU takes less than a seventh of that. Though China has produced some homegrown midlevel GPUs, the Chinese Academy of Sciences appears to be focusing its efforts on the Godson line of CPUs.

But favoring CPUs may just be another part of China’s strategy, says El-Ghazawi. He notes that it isn’t nearly as easy to program for GPUs, and Chinese supercomputer makers are looking not merely for speed records but for market share. “Although they are a latecomer,” he says, “they are really hitting the ground running.”

About the Author

Joseph Calamia is a freelance writer based in New Haven, Conn. Despite the geographic disadvantage, he was the natural choice to write this article, because he’d reported on the development of the key microprocessors involved less than a year ago. A frequent contributor to IEEE Spectrum and an alumnus of the MIT Graduate Program in Science Writing, Calamia has also written for Discover and Popular Mechanics.

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions