Better Benchmarking for Supercomputers
The usual yardstick is not a good metric
Photo: Argonne National Laboratory
Gigateps Ahead: The Intrepid supercomputer is at the top of the Graph 500 list.
Imagine a world in which a car's performance is judged solely by the time it takes to go from 0 to 100 kilometers per hour, ignoring fuel efficiency and other metrics. This, in essence, is the state of supercomputing today, says a group of U.S. computer scientists. People today typically judge supercomputers in terms of their raw number-crunching power, for example by asking how many linear algebra problems they can solve in a second. But, the scientists argue, the lion's share of challenging supercomputing problems in the 2010s requires quick and efficient processing of petabyte and exabyte-size data sets. And good number crunchers are sometimes bad exascale sifters.
It's time, the researchers say, for high-performance computers to be rated not just in petaflops (quadrillions of floating-point operations per second) but also in "gigateps" (billions of traversed edges per second).
An "edge" here is a connection between two data points. For instance, when you buy Michael Belfiore's Department of Mad Scientists from Amazon.com, one edge is the link in Amazon's computer system between your user record and the Department of Mad Scientists database entry. One necessary but CPU-intensive job Amazon continually does is to draw connections between edges that enable it to say that 4 percent of customers who bought Belfiore's book also bought Alex Abella's Soldiers of Reason and 3 percent bought John Edwards's The Geeks of War.
"What we're most interested in is being able to traverse the whole memory of the machine," says Richard Murphy, a senior researcher at Sandia National Laboratory, in Albuquerque, N.M. "There's no equivalent measure for these problems that's accepted industry-wide."
So Murphy and his colleagues from other U.S. national laboratories, academia, and industry have put together a benchmark they're calling the Graph 500. The name comes from the field of mathematics (graph theory) that the benchmark draws most heavily from. And the 500 is, Murphy says, an "aspirational" figure representing what they hope someday will be a "top 500" ratings list of the highest-performing supercomputers around the world, measured in gigateps instead of gigaflops.
The current biannual Top 500 supercomputers list recently made headlines when China's Tianhe-1A took the top position, coming in at 2.57 petaflops. The supercomputers on the list are ranked using a benchmark package of calculation speed tests called the High-Performance Linpack.
Crucially, Murphy says, the point of the Graph 500 is not to run a horse race on a new racetrack. Rather, he says, they've designed the benchmark to spur both researchers and industry toward mastering architectural problems of next-generation supercomputers. And the only way to know if you've solved those problems is for the industry to include those problems in its metrics.
In fact, by a Graph 500–type standard, supercomputers have actually been getting slower, says computer science and electrical engineering professor Peter Kogge of Notre Dame University. For the past 15 years, he says, every thousandfold increase in flops has brought with it a tenfold decrease in the memory accessible to each processor in each clock cycle. (For more on this problem, see Kogge's feature article in next month's issue of IEEE Spectrum.)
This means bigger and bigger supercomputers actually take longer and longer to access their memory. And for a problem like sifting through whole genomes or simulating the cerebral cortex, that means newer computers aren't always better.
"Big machines get embarrassingly bad gigateps results for their size," Kogge says.
Today only nine supercomputers have been rated in gigateps. The top machine, Argonne National Laboratory's IBM Blue Gene–based Intrepid, clocked in at 6.6 gigateps. But to score this high, Intrepid had to be scaled back to 20 percent of its normal size. (At full size, Intrepid ranks No. 13 on the conventional Top 500 list, at 0.46 petaflops.)
"I think Graph 500 is a far better measure for machines of the future than what we have now," Kogge says. Supercomputing, he says, needs benchmarks that measure performance across both memory and processing.
However, Jack Dongarra, professor of electrical engineering and computer science at the University of Tennessee and one of the developers of the Top 500 list, notes that the Graph 500 isn't the first new benchmark to challenge the High-Performance Linpack. The Defense Advanced Research Projects Agency, the U.S. Department of Energy, and the U.S. National Science Foundation have put forward a different group of benchmarks called the HPC Challenge, aimed at testing both computing power and widespread memory accessibility. Moreover, a coalition of industry partners—the Standard Performance Evaluation Corp.—has also assembled the SPEC set of computing benchmarks, aimed at better measuring the performance of more everyday components like Web servers.
Dongarra says that the Graph 500 may add to the list of metrics that rate a supercomputer's performance. But a Graph 500 score shouldn't be seen as some definitive number any more than the Linpack score used today. "If Graph 500 was the only benchmark we had, we'd criticize that too," he says.