Nvidia Still on Top in Machine Learning; Intel Chasing

Large language models like Llama 2 and ChatGPT are where much of the action is in AI. But how well do today’s data center–class computers execute them? Pretty well, according to the latest set of benchmark results for machine learning, with the best able to summarize more than 100 articles in a second. MLPerf’s twice-a-year data delivery was released on 11 September and included, for the first time, a test of a large language model (LLM), GPT-J. Fifteen computer companies submitted performance results in this first LLM trial, adding to the more than 13,000 other results submitted by a total of 26 companies. In one of the highlights of the data-center category, Nvidia revealed the first benchmark results for its Grace Hopper—an H100 GPU linked to the company’s new Grace CPU in the same package as if they were a single “superchip.”

Sometimes called “the Olympics of machine learning,” MLPerf consists of seven benchmark tests: image recognition, medical-imaging segmentation, object detection, speech recognition, natural-language processing, a new recommender system, and now an LLM. This set of benchmarks tested how well an already-trained neural network executed on different computer systems, a process called inferencing.

[For more details on how MLPerf works in general, go here.]

The LLM, called GPT-J and released in 2021, is on the small side for such AIs. It’s made up of some 6 billion parameters compared to GPT-3’s 175 billion. But going small was on purpose, according to MLCommons executive director David Kanter, because the organization wanted the benchmark to be achievable by a big swath of the computing industry. It’s also in line with a trend toward more compact but still capable neural networks.

This was version 3.1 of the inferencing contest, and as in previous iterations, Nvidia dominated both in the number of machines using its chips and in performance. However, Intel’s Habana Gaudi2 continued to nip at the Nvidia H100’s heels, and Qualcomm’s Cloud AI 100 chips made a strong showing in benchmarks focused on power consumption.

Nvidia Still on Top

This set of benchmarks saw the arrival of the Grace Hopper superchip, an Arm-based 72-core CPU fused to an H100 through Nvidia’s proprietary C2C link. Most other H100 systems rely on Intel Xeon or AMD Epyc CPUs housed in a separate package.

The nearest comparable system to the Grace Hopper was an Nvidia DGX H100 computer that combined two Intel Xeon CPUs with an H100 GPU. The Grace Hopper machine beat that in every category by 2 to 14 percent, depending on the benchmark. The biggest difference was achieved in the recommender system test and the smallest difference in the LLM test.

Dave Salvatore, director of AI inference, benchmarking, and cloud at Nvidia, attributed much of the Grace Hopper advantage to memory access. Through the proprietary C2C link that binds the Grace chip to the Hopper chip, the GPU can directly access 480 gigabytes of CPU memory, and there is an additional 16 GB of high-bandwidth memory attached to the Grace chip itself. (The next generation of Grace Hopper will add even more memory capacity, climbing to 140 GB from its 96 GB total today, Salvatore says.) The combined chip can also steer extra power to the GPU when the CPU is less busy, allowing the GPU to ramp up its performance.

Besides Grace Hopper’s arrival, Nvidia had its usual fine showing, as you can see in the charts below of all the inference performance results for data center–class computers.

MLPerf Data-center Inference v3.1 Results

A bar chart with 7 tall green bars and a variety of smaller ones.

Nvidia is still the one to beat in AI inferencing.

Nvidia

Things could get even better for the GPU giant. Nvidia announced a new software library that effectively doubled the H100’s performance on GPT-J. Called TensorRT-LLM, it wasn’t ready in time for MLPerf v3.1 tests, which were submitted in early August. The key innovation is something called inflight batching, says Salvatore. The work involved in executing an LLM can vary a lot. For example, the same neural network can be asked to turn a 20-page article into a one-page essay or summarize a one-page article in 100 words. TensorRT-LLM basically keeps these queries from stalling each other, so small queries can get done while big jobs are in process, too.

Intel Closes In

Intel’s Habana Gaudi2 accelerator has been stalking the H100 in previous rounds of benchmarks. This time, Intel only trialed a single 2-CPU, 8-accelerator computer and only on the LLM benchmark. That system trailed Nvidia’s fastest machine by between 8 and 22 percent at the task.

“In inferencing we are at almost parity with H100,” says Jordan Plawner, senior director of AI products at Intel. Customers, he says, are coming to see the Habana chips as “the only viable alternative to the H100,” which is in enormously high demand.

He also noted that Gaudi2 is a generation behind the H100 in terms of chip-manufacturing technology. The next generation will use the same chip technology as H100, he says.

Intel has also historically used MLPerf to show how much can be done using CPUs alone, albeit CPUs that now come with a dedicated matrix-computation unit to help with neural networks. This round was no different. Six systems of two Intel Xeon CPUs each were tested on the LLM benchmark. While they didn’t perform anywhere near GPU standards—the Grace Hopper system was often 10 times as fast as any of them or even faster—they could still spit out a summary every second or so.

Data-center Efficiency Results

Only Qualcomm and Nvidia chips were measured for this category. Qualcomm has previously emphasized its accelerators’ power efficiency, but Nvidia H100 machines competed well, too.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Nvidia Still on Top in Machine Learning; Intel Chasing

The latest MLPerf inferencing benchmarks include Nvidia’s Grace Hopper superchip

Nvidia Still on Top

MLPerf Data-center Inference v3.1 Results

Intel Closes In

Data-center Efficiency Results

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Nvidia Tops Llama 2, Stable Diffusion Speed Trials

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Llama 3 Establishes Meta as the Leader in “Open” AI

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Nvidia Still on Top in Machine Learning; Intel Chasing

The latest MLPerf inferencing benchmarks include Nvidia’s Grace Hopper superchip

Nvidia Still on Top

MLPerf Data-center Inference v3.1 Results

Intel Closes In

Data-​center Efficiency Results

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Nvidia Tops Llama 2, Stable Diffusion Speed Trials

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Llama 3 Establishes Meta as the Leader in “Open” AI

Data-center Efficiency Results