Here’s How Google’s TPU v4 AI Chip Stacked Up in Training Tests

Next generation chip doubles processing power

2 min read

Samuel K. Moore is IEEE Spectrum’s semiconductor editor.

Google CEO Sundar Pichai presenting about the company’s latest AI chip the TPU V4 (Tensor Processing Unit version 4).
GOOGLE

Google CEO Sundar Pichai says the company’s latest AI chip the TPU V4 (Tensor Processing Unit version 4) is capable of more than double the processing power of its predecessor. At Google’s I/O event this week, Pichai detailed the chip’s performance, stating that when combined into into a 4096-processor “pod”—an interconnected, liquid cooled-cluster of servers—TPU V4s were capable of crunching through a billion billion operations per second, or an exaflop, a long-standing milestone.

“This is the fastest system we’ve ever deployed at Google and a historic milestone for us,” Pichai said. Top supercomputers have yet to reach an exaflop, but the TPU V4 pod isn’t really in their league, because it calculates with lower-precision numbers than scientific supercomputers.

Powerful AI is particularly needed for training the large neural networks that run the prediction systems and natural language processing integral to digital commerce today. Google previewed TPU V4’s abilities in that area last year when it benchmarked three systems in MLPerf’s v0.7 training set, which was released in July 2020.

The tests included three vision systems (ResNet-50, SSD, and Mask R-CNN), a natural language processor (BERT) , two English-to-German translation networks (NMT and Transformer), a recommender system (DLRM), and a reinforcement learning network that plays Go.

Each system was distinguished by the number of TPU V4 accelerators, and the tests were measured in minutes needed to complete the training. (So lower is better.)

No. TPUs864256Top ranked commercial system
Image classification (ResNet)30.74.51.820.76 (1840 Nvidia accelerators)
Object detection (SSD)8.681.431.060.82 (1024 Nvidia accelerators)
Object detection (Mask R-CNN)103.115.59.9510.46 (256 Nvidia accelerators)
Translation (NMT)8.032.081.290.71 (1024 Nvidia accelerators)
Translation (Transformer)9.011.630.780.62 (480 Nvidia accelerators)
Natural language processing (BERT)45.575.731.820.81 (2048 Nvidia accelerators)
Recommendation (DLRM)4.421.213.33 (8 Nvidia accelerators)
Reinforcement learning (Minigo)150.917.07 (1792 Nvidia accelerators)

The company uses TPU infrastructure for its own AI and also rents it out to Google Cloud customers. Other big cloud companies, such as Amazon and Microsoft, have either deployed or are working on their own AI chips. Several other AI chip companies have launched with products aimed at customers needing data-center based systems. Cerebras recently detailed its 1.4-trillion-transistor second generation system for high-performance computing, which is powered by the world’s largest single chip. Other companies, such as Samba Nova and Graphcore, have reached valuations in the billions of dollars.

The Conversation (0)