Industry Out of Phase With Supercomputers

Chip-industry changes threaten U.S. supercomputing

4 min read
A supercomputer labelled Frontier.

The Frontier supercomputer at the Oak Ridge National Laboratory in Tennessee.

Oak Ridge National Laboratory

Technical and economic changes in the semiconductor industry threaten to stifle U.S. development of the next generation of high-performance computers, warns a new report from the National Research Council.

With Moore’s Law and the scaling of transistors waning, the industry is turning to chip designs that don’t work for the supercomputing that’s used in massive simulations. The report, written for the National Nuclear Security Administration, focuses on defense use in modeling the physics of nuclear weapons. But the changes also would affect simulations including those used for climate modeling and weather forecasting.

The NNSA, responsible for the U.S. nuclear stockpile, “needs to fundamentally rethink its advanced computing research, engineering, acquisition, deployment, and partnership strategy,” warns the report.

NNSA has developed massive and sophisticated codes that run on supercomputers to verify the continued security and performance of nuclear weapons designed decades ago. Keeping them up to date requires new generations of supercomputers that can run more complex models faster than the months required on today’s machines. But industry, which has shelled out big bucks for state-of-the-art fabs, is targeting big, profitable markets like cloud computing.

Nuclear weapons designers used computers to understand the physics of nuclear weapons long before the U.S. stopped underground nuclear testing in 1992. Since then, powerful computer models have been their primary tools for maintaining the country’s nuclear capability via NNSA’s Stockpile Stewardship program.

Federal spending on supercomputers for the weapons program complemented industry investment in chip production for decades. NNSA’s most powerful machine currently in operation is the Frontier computer, which began operation last year at the Oak Ridge National Laboratory, in Tennessee. It can perform 1018 (a quintillion) floating-point operations per second (flops) making it the first “exascale” computer. Custom-built by Hewlett Packard Enterprise (HPE), it can, in theory, perform 2 exaflops. HPE is building two other exascale supercomputers, one for NNSA that will be deployed at its Lawrence Livermore National Laboratory and the other at the Office of Science’s Argonne National Laboratory.”

But those easy days are over, says Kathy Yelick of the University of California at Berkeley. “The NNSA has had a really successful run over the last 30 years with a combination of high-end computing facilities and expertise in computational science that make its labs a critical national resource,” the chair of the panel that wrote the NRC report said at a 14 April online press conference. In addition to challenges in technology, she says, “the rapidly evolving geopolitical situation...reinforces the need for computing leadership as an element of deterrence.”

Industry trends are worrying. Most semiconductor manufacturing has moved outside the United States. Only a single domestic developer of supercomputers remains since the 2019 Hewlett Packard Enterprise purchase of Cray. Industry is developing technology for high-volume markets like cloud computing, which won’t transfer easily to the much smaller supercomputing market. The hot technology frontiers are artificial intelligence and quantum computing.

“Business as usual will not be adequate” for NNSA, the report says. The agency needs an aggressive road map to develop new computing technology. The report urges stressing “high-risk, high-reward research” in math and computer science “to cultivate radical innovation.” The report also says both artificial intelligence and quantum computing have promise and deserve serious investigation, but it warns that neither is likely to replace the massive computation essential to traditional simulations. Although the report was addressed to NNSA, its recommendations also apply to supercomputing by DoE’s Office of Science. The two agencies are collaborating on supercomputing at Argonne, Oak Ridge, and Livermore.

NNSA now plans to follow its new exascale computers with a new, higher-capacity system based at Los Alamos in four to five years, says Rob Neely, program director for advanced simulation and computing at the Lawrence Livermore National Laboratory, in California. A second such system will follow around 2030 at Livermore. “Early discussions with vendors about their road maps have begun,” Neely says. “We are also already well underway in implementing some of the NAS recommendations at LLNL, in particular by increasing our partnerships with cloud providers.” Livermore and Amazon Cloud Web Services are exploring common interests in cloud and high-performance computing technology.

What happens next “will depend a lot on where overall technology trends are headed in that time frame, and how well we can adapt our codes to those changes without sacrificing mission needs,” says Neely. He expects AI and the cloud to influence the post-exascale systems—if NNSA can adapt its codes to the new technology. That’s a big if. Having just spent a decade adapting its codes to GPUs, the NNSA brain trust is “not anxious to divest from the GPU accelerated approach just yet.”

Both NNSA and the authors of the report think quantum computing is farther off. “They will not replace classical computers for our primary mission of large, complex, and integrated weapons design codes anytime in the next 10 to 15 years,” says Neely.

The overall concerns are not just huge and highly specialized weapons codes. A government program identified more than 20 applications requiring exascale computing—many of which would benefit from even larger scales.

Update 22 Apr. 2023: The story was corrected to convey the eventual location of two-exaflop machines that HPE is developing for U.S. labs—not Los Alamos, as the story originally reported, but rather Lawrence Livermore and Argonne National Labs.

Update 16 May 2023: The story was corrected to clarify where the next exascale computers are planned (at LLNL and Argonne) as well as the roles of the National Nuclear Security Administration and the Office of Science, both parts of the Department of Energy.

The Conversation (3)
James Brady28 Apr, 2023

The article has assertions and conjecture but not much else. How about a little 'why'. it would be nice to know why the old bombs have this massive need for increased compute power to determine they are still safe.

Stephen Herbein21 Apr, 2023

Some nitpicks: ORNL is not an NNSA lab, so Frontier isn’t an NNSA machine. The first exascale NNSA machine (to my knowledge) will be El Capitan at LLNL, which will be built by Cray. I don’t believe Cray is building any machines for LANL currently.

1 Reply