Here’s How Google’s TPU v4 AI Chip Stacked Up in Training Tests

Next generation chip doubles processing power

2 min read
Google CEO Sundar Pichai presenting about the company’s latest AI chip the TPU V4 (Tensor Processing Unit version 4).
GOOGLE

Google CEO Sundar Pichai says the company’s latest AI chip the TPU V4 (Tensor Processing Unit version 4) is capable of more than double the processing power of its predecessor. At Google’s I/O event this week, Pichai detailed the chip’s performance, stating that when combined into into a 4096-processor “pod”—an interconnected, liquid cooled-cluster of servers—TPU V4s were capable of crunching through a billion billion operations per second, or an exaflop, a long-standing milestone.

“This is the fastest system we’ve ever deployed at Google and a historic milestone for us,” Pichai said. Top supercomputers have yet to reach an exaflop, but the TPU V4 pod isn’t really in their league, because it calculates with lower-precision numbers than scientific supercomputers.

Powerful AI is particularly needed for training the large neural networks that run the prediction systems and natural language processing integral to digital commerce today. Google previewed TPU V4’s abilities in that area last year when it benchmarked three systems in MLPerf’s v0.7 training set, which was released in July 2020.

The tests included three vision systems (ResNet-50, SSD, and Mask R-CNN), a natural language processor (BERT) , two English-to-German translation networks (NMT and Transformer), a recommender system (DLRM), and a reinforcement learning network that plays Go.

Each system was distinguished by the number of TPU V4 accelerators, and the tests were measured in minutes needed to complete the training. (So lower is better.)

No. TPUs 8 64 256 Top ranked commercial system
Image classification (ResNet) 30.7 4.5 1.82 0.76 (1840 Nvidia accelerators)
Object detection (SSD) 8.68 1.43 1.06 0.82 (1024 Nvidia accelerators)
Object detection (Mask R-CNN) 103.1 15.5 9.95 10.46 (256 Nvidia accelerators)
Translation (NMT) 8.03 2.08 1.29 0.71 (1024 Nvidia accelerators)
Translation (Transformer) 9.01 1.63 0.78 0.62 (480 Nvidia accelerators)
Natural language processing (BERT) 45.57 5.73 1.82 0.81 (2048 Nvidia accelerators)
Recommendation (DLRM) 4.42 1.21 3.33 (8 Nvidia accelerators)
Reinforcement learning (Minigo) 150.9 17.07 (1792 Nvidia accelerators)

The company uses TPU infrastructure for its own AI and also rents it out to Google Cloud customers. Other big cloud companies, such as Amazon and Microsoft, have either deployed or are working on their own AI chips. Several other AI chip companies have launched with products aimed at customers needing data-center based systems. Cerebras recently detailed its 1.4-trillion-transistor second generation system for high-performance computing, which is powered by the world’s largest single chip. Other companies, such as Samba Nova and Graphcore, have reached valuations in the billions of dollars.

The Conversation (0)

The Future of Deep Learning Is Photonic

Computing with light could slash the energy needs of neural networks

10 min read

This computer rendering depicts the pattern on a photonic chip that the author and his colleagues have devised for performing neural-network calculations using light.

Alexander Sludds
DarkBlue1

Think of the many tasks to which computers are being applied that in the not-so-distant past required human intuition. Computers routinely identify objects in images, transcribe speech, translate between languages, diagnose medical conditions, play complex games, and drive cars.

The technique that has empowered these stunning developments is called deep learning, a term that refers to mathematical models known as artificial neural networks. Deep learning is a subfield of machine learning, a branch of computer science based on fitting complex models to data.

Keep Reading ↓ Show less