Google Details Tensor Chip Powers

In January’s special Top Tech 2017 issue, I wrote about various efforts to produce custom hardware tailored for performing deep-learning calculations. Prime among those is Google’s Tensor Processing Unit, or TPU, which Google has deployed in its data centers since early in 2015.

In that article, I speculated that the TPU was likely designed for performing what are called “inference” calculations. That is, it’s designed to quickly and efficiently calculate whatever it is that the neural-network it’s running was created to do. But that neural network would also have to be “trained,” meaning that its many parameters would be tuned to carry out the desired task. Training a neural network normally takes a different set of computational skills: In particular, training often requires the use of higher-precision arithmetic than does inference.

Yesterday, Google released a fairly detailed description of the TPU and its performance relative to CPUs and GPUs. I was happy to see that the surmise I had made in January was correct: The TPU is built for doing inference, having hardware that operates on 8-bit integers rather than higher-precision floating-point numbers.

Yesterday afternoon, David Patterson, an emeritus professor of computer science at the University of California, Berkeley and one of the co-authors of the report, presented these findings at a regional seminar of the National Academy of Engineering, held at the Computer History Museum in Menlo Park, Calif. The abstract for his talk summed up the main point nicely. It reads in part: “The TPU is an order of magnitude faster than contemporary CPUs and GPUs and its relative performance per watt is even larger.”

Google’s blog post about the release of the report shows how much of a difference in relative performance there can be, particularly in regard to energy efficiency. For example, compared with a contemporary GPU, the TPU is said to offer 83 times the performance per watt. That might be something of an exaggeration, because the report itself claims only that there’s a range of between 41 times and 83 times. And that’s for a quantity the authors call incremental performance. The range of improvement for total performance is less: from 14 to 16 times better for the TPU compared with that of a GPU.

The benchmark tests used to reach these conclusions are based on a half dozen of the actual kinds of neural-network programs that people are running at Google data centers. So it’s unlikely that anyone would critique these results on the basis of the tests not reflecting real-world circumstances. But it struck me that a different critique might well be in order.

The problem is this: These researchers are comparing their 8-bit TPU with higher-precision GPUs and CPUs, which are just not well suited to inference calculations. The GPU exemplar Google used in its report is Nvidia’s K80 board, which performs both single-precision (32-bit) and double-precision (64-bit) calculations. While they’re often important for training neural networks, such levels of precision aren’t typically needed for inference.

In my January story, I noted that Nvidia’s newer Pascal family of GPUs can perform “half-precision” (16-bit) operations and speculated that the company may soon produce units fully capable of 8-bit operations, in which case they might be much more efficient when carrying out inference calculations for neural-network programs.

The report’s authors anticipated such a criticism in the final section of their paper; there they considered the assertion (which they label a fallacy) that “CPU and GPU results would be comparable to the TPU if we used them more efficiently or compared to newer versions.” In discussing this point, they say they had tested only one CPU that could support 8-bit calculations, and the TPU was 3.5 times better. But they don’t really address the question of how GPU’s tailored for 8-bit calculations would fare—an important question if such GPUs soon became widely available.

Should that come to pass, I hope that these Googlers will re-run their benchmarks and let us know how TPUs and 8-bit-capable GPUs compare.

tpu data center neural networks hardware tensor processing unit software google machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Google Details Tensor Chip Powers

Google researchers explain the benefits of the company's mysterious Tensor Processing Unit

Kyocera's Optical Tech Boosts Underwater Data Speeds

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Related Stories

Google’s Quantum Computer Exponentially Suppresses Errors

The Future of Deep Learning Is Photonic

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Google Details Tensor Chip Powers

Google researchers explain the benefits of the company's mysterious Tensor Processing Unit

Kyocera's Optical Tech Boosts Underwater Data Speeds

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Related Stories

Cloud Computing’s Coming Energy Crisis

Google’s Quantum Computer Exponentially Suppresses Errors

The Future of Deep Learning Is Photonic