When computer scientist Jack Dongarra traveled to China in May to see the Tianhe-2 supercomputer, he was ready to be impressed. And the machine, built by the National University of Defense Technology, in Changsha, didn’t disappoint. By early June, Tianhe-2 had demonstrated a peak speed of 33.86 petaflops: some 34 million billion floating point operations per second, far more than what it needed to place first on June’s Top500, a biannual list of the fastest supercomputers, compiled and adjudicated by Dongarra and his colleagues.
But Tianhe-2 isn’t just fast; it could also be a bellwether for the field of scientific supercomputing. The Intel-based machine is a hybrid that uses 32 000 multicore central processing units (CPUs) and 48 000 “coprocessors”—separate “accelerator” chips that incorporate dozens of cores that are specially designed to churn through the floating-point arithmetic operations needed for simulations of climate and the early universe, among many others.
Advanced Micro Devices (AMD) and Nvidia Corp., which make general-purpose graphics-processing units (GPGPUs) that are well-suited for this sort of application, have previously dominated the accelerator market. But Intel is making a strong bid for the space, with a different kind of accelerator it calls the Xeon Phi coprocessor.
Seven systems on the Top500 list were already using the Intel chip by the time it made its official debut in November 2012. Among top supercomputers, Intel’s accelerator share is still small; machines using Nvidia’s GPUs outnumber those with Intel’s coprocessors 31 to 11. But Tianhe-2 has had a big impact on the landscape. If you tally up all the petaflops on the list, “the aggregated performance delivered by Intel Xeon Phi coprocessors is now bigger than the performance delivered by GPU accelerators,” says Intel spokesperson Radoslaw Walczyk. “It is a big win,” says Sergis Mushell, an analyst at the technology research firm Gartner.
Intel’s chips contain up to 61 cores and are built using the company’s 22-nanometer manufacturing process, which is a generation ahead of the competition. The company says its coprocessors have a few advantages over GPGPUs: They can operate independently of CPUs and they don’t require special code to program. But it will take some time to see how Intel’s venture will fare.
The market for accelerators is still small. As of June, just 53 of the 500 computers on the list use them, down from 62 in November. But this small decline may be temporary. “They deliver much more bang for your kilowatt,” says Michael Shuey, a systems architect in high performance computing at Purdue University. And even though just a tenth of the machines on the list incorporate accelerators, Dongarra says, the machines account for a third of the aggregated performance. “It’s a small number that has a large impact,” he says.
One stumbling block to more widespread adoption has been programming complexity, says William Gropp at the University of Illinois at Urbana-Champaign. The university hosts the U.S. National Science Foundation’s most powerful machine, Blue Waters. The CPU- and GPU-based system sustains speeds of more than 1 petaflops.
As with any machine that pairs CPUs and accelerators, Blue Waters’ programmers must find effective ways to transport data between the chips, a process that can consume a lot of power. “Anything you do has to be a big enough lump of work in order to amortize moving the data over,” Gropp says. This makes programming more difficult: “You have to figure out how to aggregate [work] into a lump whose size is not based on the characteristics of your problem but the characteristics of your hardware.”
“We’re making significant strides in dealing with the added complexity,” Gropp adds. “In the short term, you’ll see more and more of these systems with accelerators. [But] I don’t think that they’ll completely populate the list.” CPU-only supercomputers, such as IBM’s Blue Gene machines, he says, may remain competitive for a while yet.
What could ultimately take over supercomputing are computers that have both CPUs and specialized cores on the same chip. AMD, Intel, and Nvidia make such chips for mobile devices and personal computers. Although this sort of “heterogeneous integration” has yet to emerge in high-performance systems, Intel says a version of its next coprocessor, Knights Landing, could be installed directly into supercomputer motherboard sockets. For some computer scientists, integration is just a matter of time. “We’re going to see things get closer together,” says Dongarra. “It’s a natural trend.”
This article originally appeared in print as "Intel Strikes Back ."
Rachel Courtland, an unabashed astronomy aficionado, is a former senior associate editor at Spectrum. She now works in the editorial department at Nature. At Spectrum, she wrote about a variety of engineering efforts, including the quest for energy-producing fusion at the National Ignition Facility and the hunt for dark matter using an ultraquiet radio receiver. In 2014, she received a Neal Award for her feature on shrinking transistors and how the semiconductor industry talks about the challenge.