Supercomputers Flex Their AI Muscles

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Scientific supercomputing is not immune to the wave of machine learning that's swept the tech world. Those using supercomputers to uncover the structure of the universe, discover new molecules, and predict the global climate are increasingly using neural networks to do so. And as is long-standing tradition in the field of high-performance computing, it's all going to be measured down to the last floating-point operation.

Twice a year, Top500.org publishes a ranking of raw computing power using a value called Rmax, derived from benchmark software called Linpack. By that measure, it's been a bit of a dull year. The ranking of the top nine systems are unchanged from June, with Japan's Supercomputer Fugaku on top at 442,010 trillion floating point operations per second. That leaves the Fujitsu-built system a bit shy of the long-sought goal of exascale computing—one-million trillion 64-bit floating-point operations per second, or exaflops.

But by another measure—one more related to AI—Fugagku and its competitor the Summit supercomputer at Oak Ridge National Laboratory have already passed the exascale mark. That benchmark, called HPL-AI, measures a system's performance using the lower-precision numbers—16-bits or less—common to neural network computing. Using that yardstick, Fugaku hits 2 exaflops (no change from June 2021) and Summit reaches 1.4 (a 23 percent increase).

By one benchmark, related to AI, Japan's Fugaku and the U.S.'s Summit supercomputers are already doing exascale computing.

But HPL-AI isn't really how AI is done in supercomputers today. Enter MLCommons, the industry organization that's been setting realistic tests for AI systems of all sizes. It released results from version 1.0 of its high-performance computing benchmarks, called MLPerf HPC, this week.

The suite of benchmarks measures the time it takes to train real scientific machine learning models to agreed-on quality targets. Compared to MLPerf HPC version 0.7, basically a warmup round from last year, the best results in version 1.0 showed a 4- to 7-fold improvement. Eight supercomputing centers took part, producing 30 benchmark results.

As in MLPerf's other benchmarking efforts, there were two divisions: "Closed" submissions all used the same neural network model to ensure a more apples-to-apples comparison; "open" submissions were allowed to modify their models.

The three neural networks trialed were:

CosmoFlow uses the distribution of matter in telescope images to predict things about dark energy and other mysteries of the universe.
DeepCAM tests the detection of cyclones and other extreme weather in climate data.
OpenCatalyst, the newest benchmark, predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage.

In the closed division, there were two ways of testing these networks: Strong scaling allowed participants to use as much of the supercomputer's resources to achieve the fastest neural network training time. Because it's not really practical to use an entire supercomputer-worth of CPUs, accelerator chips, and bandwidth resources on a single neural network, strong scaling shows what researchers think the optimal distribution of resources can do. Weak scaling, in contrast, breaks up the entire supercomputer into hundreds of identical versions of the same neural network to figure out what the system's AI abilities are in total.

Here's a selection of results:

Argonne National Laboratories used its Theta supercomputer to measure strong scaling for DeepCAM and OpenCatalyst. Using 32 CPUs and 129 Nvidia GPUs, Argonne researchers trained DeepCAM in 32.19 minutes and OpenCatalyst in 256.7 minutes. Argonne says it plans to use the results to develop better AI algorithms for two upcoming systems, Polaris and Aurora.

The Swiss National Supercomputing Centre used Piz Daint to train OpenCatalyst and DeepCAM. In the strong scaling category, Piz Daint trained OpenCatalyst in 753.11 minutes using 256 CPUs and 256 GPUs. It finished DeepCAM in 21.88 minutes using 1024 of each. The center will use the results to inform algorithms for its upcoming Alps supercomputer.

Fujitsu and RIKEN used 512 of Fugaku's custom-made processors to perform CosmoFlow in 114 minutes. It then used half of the complete system—82,944 processors—to perform the weak scaling benchmark on the same neural network. That meant training 637 instances of CosmoFlow, which it managed to do at an average of 1.29 models per minutes for a total of 495.66 minutes (not quite 8 hours).

Helmholtz AI, a joint effort of Germany's largest research centers, tested both the JUWELS and HoreKa supercomputers. HoreKa's best effort was to chug through DeepCAM in 4.36 minutes using 256 CPUs and 512 GPUs. JUWELS did it in as little as 2.56 minutes using 1024 CPUs and 2048 GPUs. For CosmoFlow, its best effort was 16.73 minutes using 512 CPUs and 1024 GPUs. In the weak scaling benchmark JUWELS used 1536 CPUs and 3072 GPUs to plow through DeepCAM at rate of 0.76 models per minute.

Lawrence Berkeley National Laboratory used the Perlmutter supercomputer to conquer CosmoFlow in 8.5 minutes (256 CPUs and 1024 GPUs), DeepCAM in 2.51 minutes (512 CPUs and 2048 GPUs), and OpenCatalyst in 111.86 minutes (128 CPUs and 512 GPUs). It used 1280 CPUs and 5120 GPUs for the weak scaling effort, yielding 0.68 models per minute for CosmoFlow and 2.06 models per minute for DeepCAM.

The (U.S.) National Center for Supercomputing Applications did its benchmarks on the Hardware Accelerated Learning (HAL) system. Using 32 CPUs and 64 GPUs they trained OpenCatalyst in 1021.18 minutes and DeepCAM in 133.91 minutes.

Nvidia, which made the GPUs used in every entry except Riken's, tested its DGX A100 systems on CosmoFlow (8.04 minutes using 256 CPUs and 1024 GPUs) and DeepCAM (1.67 minutes with 512 CPUs and 2048 GPUs). In weak scaling the system was made up of 1024 CPUs and 4096 GPUs and it plowed through 0.73 CosmoFlow models per minute and 5.27 DeepCAM models per minute.

Texas Advanced Computing Center's Frontera-Longhorn system tackled CosmoFlow in 140.45 minutes and DeepCAM in 76.9 minutes using 64 CPUs and 128 GPUs.

Editor's note 1 Dec 2021: This post incorrectly defined exaflop as "one-thousand trillion 64-bit floating-point operations per second." It now correctly defines it as one-million trillion flops per second.

From Your Site Articles

Cerebras Introduces Its 2-Exaflop AI Supercomputer - IEEE Spectrum ›

hpc ai fugaku exascale top500 supercomputing

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Supercomputers Flex Their AI Muscles

New benchmarks reveal science-task speedups

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Machine Learning Tests Keep Getting Bigger

Systemic Blowback: AI's Foreseeable Fallout

Airflow: From Stagnation to Millions of Downloads

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Supercomputers Flex Their AI Muscles

New benchmarks reveal science-task speedups

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Machine Learning Tests Keep Getting Bigger

Systemic Blowback: AI's Foreseeable Fallout

Airflow: From Stagnation to Millions of Downloads