AI Training Is Outpacing Moore’s Law

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceArtificial IntelligenceBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersPodcastsSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2024 IEEE — All rights reserved. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

The days and sometimes weeks it took to train AIs only a few years ago was a big reason behind the launch of billions of dollars-worth of new computing startups over the last few years—including Cerebras Systems, Graphcore, Habana Labs, and SambaNova Systems. In addition, Google, Intel, Nvidia and other established companies made their own similar amounts of internal investment (and sometimes acquisition). With the newest edition of the MLPerf training benchmark results, there’s clear evidence that the money was worth it.

The gains to AI training performance since MLPerf benchmarks began “managed to dramatically outstrip Moore’s Law,” says David Kanter, executive director of the MLPerf parent organization MLCommons. The increase in transistor density would account for a little more than doubling of performance between the early version of the MLPerf benchmarks and those from June 2021. But improvements to software as well as processor and computer architecture produced a 6.8-11-fold speedup for the best benchmark results. In the newest tests, called version 1.1, the best results improved by up to 2.3 times over those from June.

According to Nvidia the performance of systems using A100 GPUs has increased more than 5-fold in the last 18 months and 20-fold since the first MLPerf benchmarks three years ago.

For the first time Microsoft entered its Azure cloud AI offerings into MLPerf, muscling through all eight of the test networks using a variety of resources. They ranged in scale from 2 AMD Epyc CPUs and 8 Nvidia A100 GPUs to 512 CPUs and 2048 GPUs. Scale clearly mattered. The top range trained AIs in less than a minute while the two-and-eight combination often needed 20 minutes or more.

A line chart with 4 lines sweeping up to the right. Arrows highlight the gap between the bottom line and the top lines. Moore's Law can only do so much. Software and other progress have made the difference in training AIs.MLCommons

Nvidia worked closely with Microsoft on the benchmark tests, and, as in previous MLPerf lists, Nvidia GPUs were the AI accelerators behind most of the entries, including those from Dell, Inspur, and Supermicro. Nvidia itself topped all the results for commercially available systems, relying on the unmatched scale of its Selene AI supercomputer. Selene is made up of commercially available modular DGX SuperPod systems. In its most massive effort, Selene brought to bear 1080 AMD Epyc CPUs and 4320 A100 GPUs to train the natural language processor BERT in less than 16 seconds, a feat that took most smaller systems about 20 minutes.

According to Nvidia the performance of systems using A100 GPUs has increased more than 5-fold in the last 18 months and 20-fold since the first MLPerf benchmarks three years ago. That’s thanks to software innovation and improved networks, the company says. (For more, see Nvidia's blog.)

Given Nvidia’s pedigree and performance on these AI benchmarks, its natural for new competitors to compare themselves to it. That’s what UK-based Graphcore is doing when it notes that it’s base computing unit the Pod16—1 CPU and 16 IPU accelerators—beats the Nvidia base unit the DGX A100—2 CPUs and 8 GPUs—by nearly a minute.

Four computers side by side. The left-most one is short and wide. The next is tall, thin, and black. The next is two of the previous, and the final is four of the previous. Graphcore brought its bigger systems out to play.Graphcore

For this edition of MLPerf, Graphcore debuted image classification and natual language processing benchmarks for its combinations of those base units, the Pod64, Pod128, and (you saw this coming, right?) Pod256. The latter, made up of 32 CPUs and 256 IPUs, was the fourth fastest system behind Nvidia’s Selene and Intel’s Habana Gaudi to finish ResNet image classification training in 3:48. For natural language processing the Pod256 and Pod128 were third and fourth on the list, again behind Selene, finishing in 6:54 and 10:36. (For more see Graphcore's blog.)

You might have noticed that the CPU-to-accelerator chip ratios are quite different between Nvidia-based offerings—about 1 to 4—and Graphcore’s systems—as low as 1 to 32. That’s by design, say Graphcore engineers. The IPU is designed to depend less on a CPU’s control when operating neural networks.

You can see the opposite with Habana Labs, which Intel purchased for about US $2 billion in 2019. For example, for its high-ranking training on image classification, Intel used 64 Xeon CPUs and 128 Habana Gaudi accelerators to train ResNet in less than 5:30. It used 32 CPUs and 64 accelerators to train the BERT natural language neural net in 11:52. (For more, see Habana's blog.)

Google’s contribution to this batch of benchmark scores was a bit unusual. Rather than demonstrate commercial or cloud systems with the company’s TPU v4 processor technology, Google engineers submitted results for two hugely outsized natural language processing neural nets.

Using its publicly available TPU v4 cloud, the company ran a version of Lingvo, an NLP with a whopping 480-billion parameters compared to BERT’s 110 million. The cloud platform used 1024 AMD Epyc CPUs and 2048 TPUs to complete the training task in just under 20 hours. Using a research system consisting of 512 AMD Rome CPUs and 1024 TPUs Google trained a 200-billion parameter version of Lingvo in 13.5 hours. (It took 55 hours and 44 hours to do the whole process end-to-end including steps needed to get the training started, Google reports.)

Structurally, Lingvo is similar enough to BERT to fit into that category, but it also resembles other really the large conversation AI, such as LaMDA and GPT-3 ,that computing giants have been working on. Google thinks that huge-model training should eventually become a part of future MLPerf commercial benchmarks. (For more, see Google's blog.)

However, MLCommons’ Kanter, points out that the expense of training such systems is high enough to exclude many participants.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

AI Training Is Outpacing Moore’s Law

The new set of MLPerf results proves it

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

AI Training Is Outpacing Moore’s Law

The new set of MLPerf results proves it

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training