Benchmark Shows AIs Are Getting Speedier

This week, AI industry group MLCommons released a new set of results for AI performance. The new list, MLPerf Version 1.1, follows the first official set of benchmarks by five months and includes more than 1800 results from 20 organizations, with 350 measurements of energy efficiency. The majority of systems improved by between 5-30 percent from earlier this year, with some more than doubling their previous performance stats, according to MLCommons. The new results come on the heels of the announcement, last week, of a new machine-learning benchmark, called TCP-AIx.

In MLPerf's inferencing benchmarks, systems made up of combinations of CPUs and GPUs or other accelerator chips are tested on up to six neural networks performing a variety of common functions—image classification, object detection, speech recognition, 3D medical imaging, natural language processing, and recommendation. For commercially available datacenter-based systems they were tested under two conditions—a simulation of real datacenter activity where queries arrive in bursts and "offline" activity where all the data is available at once. Computers meant to work onsite instead of in the data center—what MLPerf calls the edge—were measured in the offline state and as if they were receiving a single stream of data, such as from a security camera.

Although there were datacenter-class submissions from Dell, HPE, Inspur, Intel, LTech Korea, Lenovo, Nvidia, Neuchips, Qualcomm, and others, all but those from Qualcomm and Neuchips used Nvidia AI accelerator chips. Intel used no accelerator chip at all, instead demonstrating the performance of its CPUs alone. Neuchips only participated in the recommendation benchmark, as their accelerator, the RecAccel, is designed specifically to speed up recommender systems—which are used for recommending e-commerce items and for ranking search results.

A chart labelled MLPerf Inference 1:1 Diverse data center and edge use cases and scenarios. The bottom compares data center and edge. Each has a cylinder labelled ? on the left and a check on the right, but underneath the ? on Data Center are 9 disordered ?s and Edge are 4 ?s. There are 8 check boxes under data center, and 4 under edge. MLPerf tests six common AIs under several conditions.NVIDIA

For the results Nvidia submitted itself, the company used software improvements alone to eke out as much as a 50 percent performance improvement over the past year. The systems tested were usually made up of one or two CPUs along with as many as eight accelerators. On a per-accelerator basis, systems with Nvidia A100 accelerators showed about double or more the performance those using the lower-power Nvidia A30. A30-based computers edged out systems based on Qualcomm's Cloud AI 100 in four of six tests in the server scenario.

However, Qualcomm senior director of product management John Kehrli points out that his company's accelerators were deliberately limited to a datacenter-friendly 75-watt power envelope per chip, but in the offline image recognition task they still managed to speed past some Nvidia A100-based computers with accelerators that had peak thermal designs of 400 W each.

Nvidia senior product manager for AI inferencing Dave Salvator pointed to two other outcomes for the company's accelerators: First, for the first time Nvidia A100 accelerators were paired with server-class Arm CPUs instead of x86 CPUs. The results were nearly identical between Arm and x86 systems across all six benchmarks. "That's an important milestone for Arm," says Salvator. "It's also a statement about the readiness of our software stack to be able to run the Arm architecture in a datacenter environment."

Chart labelled Comparing MLPerf 0.7 to MLPerf 1.1 on NVIDIA A100 shows Speedup Over V0.7 submissions from 101% to 150% by topic Nvidia has made gains in AI using only software improvements.NVIDIA

Separately from the formal MLPerf benchmarks, Nvidia showed off a new software technique called multi-instance GPU (MiG), which allows a single GPU to act as if it's seven separate chips from the point of view of software. When the company ran all six benchmarks simultaneously plus an extra instance of object detection (just as a flex, I assume) the results were 95 percent of the single-instance value.

Nvidia A100-based systems also cleaned up on the edge server category, where systems are designed for places like stores and offices. These computers were tested along most of the same six benchmarks but with the recommender system swapped out for a low-res version of object detection. But in this category, there was a wider range of accelerators on offer, including Centaur's AI Integrated Coprocessor; Qualcomm's AI 100; Edgecortix' DNA-F200 v2, Nvidia's Jetson Xavier, and FuriosaAI's Warboy.

Purple and white chart labelled Inference power efficiency. Qualcomm's Cloud AI100 PCIe is labelled as 197.40, well above the others, which range from 48.22 to 112.03. Qualcomm topped the efficiency ranking for a machine vision test.Qualcomm

With six tests under two conditions each in two commercial categories using systems that vary in number of CPUs and accelerators, MLPerf performance results don't really lend themselves to some kind of simple ordered list like Top500.org achieves with supercomputing. The parts that come closest are the efficiency tests, which can be boiled down to inferences per second per watt for the offline component. Qualcomm systems were tested for efficiency on object recognition, object detection, and natural language processing in both the datacenter and edge categories. In terms of inferences per second per watt, they beat the Nvidia-backed systems at the machine vision tests, but not on language processing. Nvidia-accelerated systems took all the rest of the spots.

In seeming opposition to MLPerf's multidimensional nature, a new benchmark was introduced last week that aims for a single number. The Transaction Processing Performance Council says the TCPx-AI benchmark:

Generates and processes large volumes of data
Trains preprocessed data to produce realistic machine learning models
Conducts accurate insights for real-world customer scenarios based on the generated models
Can scale to large distributed configurations
Allows for flexibility in configuration changes to meet the demands of the dynamic AI landscape.

The benchmark is meant to capture the complete end-to-end process of machine learning and AI, explains Hamesh Patel, chair of the TPCx-AI committee and principal engineer at Intel. That includes parts of the process that aren't included in MLPerf such as preparing the data and optimization. "There was no benchmark that emulates an entire data science pipeline," he says. "Customers have said it can take a week to prep [the data] and two days to train" a neural network.

Big differences between MLPerf and TPCx-AI include the latter's dependence on synthetic data—data that resembles real data but is generated on the fly. MLPerf uses sets of real data for both training and inference, and MLCommons executive director David Kanter was skeptical about the value of results from synthetic data.

Membership among MLCommons and TPC has a lot of overlap, so it remains to be seen which if either of the two benchmarks gains over the other in credibility. MLPerf certainly has the advantage for the moment, and computer system makers are already being asked for MLPerf data as part of requests for proposals, at least two MLPerf participants report.

From Your Site Articles

machine learning software benchmarks natural language processing machine vision artificial intelligence

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Benchmark Shows AIs Are Getting Speedier

MLPerf stats show some systems have doubled performance this year, competing benchmark coming

Why One Man Spent 12 Years Fighting Robocalls

Tiny Biosensor Unlocks the Secrets of Sweat

Startups Say India Is Ideal for Testing Self-Driving Cars

Related Stories

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training

What If the Biggest AI Fear Is AI Fear Itself?

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Benchmark Shows AIs Are Getting Speedier

MLPerf stats show some systems have doubled performance this year, competing benchmark coming

Why One Man Spent 12 Years Fighting Robocalls

Tiny Biosensor Unlocks the Secrets of Sweat

Startups Say India Is Ideal for Testing Self-Driving Cars

Related Stories

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training

What If the Biggest AI Fear Is AI Fear Itself?