Oak Ridge National Lab may be the first to reach 1,000,000,000,000,000,000 operations per second
In 2018, a new supercomputer called Summit was installed at Oak Ridge National Laboratory, in Tennessee. Its theoretical peak capacity was nearly 200 petaflops—that’s 200 thousand trillion floating-point operations per second. At the time, it was the most powerful supercomputer in the world, beating out the previous record holder, China’s Sunway TaihuLight, by a comfortable margin, according to the well-known Top500 ranking of supercomputers. (Summit is currently No. 2, a Japanese supercomputer called Fugaku having since overtaken it.)
In just four short years, though, demand for supercomputing services at Oak Ridge has outstripped even this colossal machine. “Summit is four to five times oversubscribed,” says Justin Whitt, who directs ORNL’s Leadership Computing Facility. “That limits the number of research projects that can use it.”
The obvious remedy is to get a faster supercomputer. And that’s exactly what Oak Ridge is doing. The new supercomputer being assembled there is called Frontier. When complete, it will have a peak theoretical capacity in excess of 1.5 exaflops.
The remarkable thing about Frontier is not that it will be more than seven times as powerful as Summit, stunning as that figure is. The remarkable thing is that it will use only twice the power. That’s still a lot of power—Frontier is expected to draw 29 megawatts, enough to power a town the size of Cupertino, Calif. But it’s a manageable amount, both in terms of what the grid there can supply and what the electricity bill will be.
“The efficiency comes from putting more computer hardware in smaller and smaller spaces,” says Whitt. “Each of these [computer] cabinets weighs as much as a full-sized pickup.” That’s because they are stuffed with what ORNL’s spec sheet describes as “high density compute blades powered by HPC- and AI-optimized AMD EPYC processors and Radeon Instinct GPU accelerators purpose-built for the needs of exascale computing.”
Building a supercomputer of this capacity is hard enough. But doing so during a pandemic has been especially challenging. “Supply-chain issues were broad,” says Whitt, including shortages of many things that aren’t special to building a high-performance supercomputer. “It could just be sheet metal or screws.”
Supply issues are indeed the reason Frontier will become operational in 2022 ahead of another planned supercomputer, Aurora, which will be installed at Argonne National Laboratory, in Illinois. Aurora was to come first, but its construction has been delayed, because Intel is having difficulty supplying the processors and GPUs needed for this machine.
At the time of this writing, technicians at Oak Ridge were assembling and testing parts of Frontier in hopes that the giant machine will come together before the end of 2021 and with the intention of making it fully operational and available for users in 2022. Will we then be able to call it the world’s first exascale supercomputer?
That depends on your definition. “[Japan’s Fugaku supercomputer] actually achieved 2 exaflops with a different benchmark,” says Jack Dongarra of the University of Tennessee, one of the specialists behind the Top500 list. Those rankings, he explains, are based on a benchmark that involves 64-bit floating-point calculations, the kind used to solve three-dimensional partial differential equations as required for many physical simulations. “That’s the bottom line of what supercomputers are being used for,” says Dongarra. But he also points out that supercomputers are increasingly used to train deep neural networks, where 16-bit precision can suffice.
Will we be able to call Frontier the world’s first exascale supercomputer? That depends on your definition.
And then there’s Folding@Home, a distributed-computing project intended to simulate protein folding. “I would call that a specialized computer,” says Dongarra, one that can do its job because the calculations involved are “embarrassingly parallel.” That is, separate computers can perform the required calculations independently—or at least largely so, with what little communication between them is needed being conveyed over the Internet. In March of 2020, the Folding@Home project proudly announced on Twitter, “We’ve crossed the exaflop barrier!”
But if you stick with the usual benchmark, the one used for the Top500 ranking, no supercomputer yet qualifies as an exascale machine. Frontier may be the first. Or well, it’s on track to be the first known exascale supercomputer, says Dongarra. He explains that before the June 2021 Top500 ranking came out, a rumor emerged that China has at least one, if not two, supercomputers already running at exascale.
Why would Chinese engineers construct such a machine without telling anyone about it? At the time, Dongarra says, he thought that maybe they were waiting for the 100-year anniversary of the founding of the Chinese communist party. But that date came and went in July. He now speculates that Chinese officials may be worried that making its existence public would exacerbate geopolitical rivalries and cause the United States to restrict the export of certain technologies to China.
Perhaps that explains it. But it’s going to be increasingly difficult for Chinese researchers not to let this cat, if it truly exists, out of the bag. For the moment, anyway, with only rumors to go on, this exascale rival to Frontier is a Schrödinger’s cat—both here and not here at the same time.
This article appears in the January 2022 print issue as “The Exascale Era Is Upon Us.”
- Obama Orders Speedy Delivery of First Exascale Supercomputer ... ›
- When Will We Have an Exascale Supercomputer? - IEEE Spectrum ›
- Will China Attain Exascale Supercomputing in 2020? - IEEE Spectrum ›