Expect Deeper and Cheaper Machine Learning

Supercharged hardware will speed up deep learning in everything from tiny devices to massive data centers

4 min read
Illustration of a book opened over a laptop computer
Photo-illustration: Edmon de Haro

Illustration of a book opened over a laptop computer Photo-illustration: Edmon de Haro

Last March, Google’s computers roundly beat the world-class Go champion Lee Sedol, marking a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand.

Unknown to the public at the time was that Google had an ace up its sleeve. You see, the computers Google used to defeat Sedol contained special-purpose hardware—a computer card Google calls its Tensor Processing Unit.

Norm Jouppi, a hardware engineer at Google, announced the existence of the Tensor Processing Unit two months after the Go match, explaining in a blog post that Google had been outfitting its data centers with these new accelerator cards for more than a year. Google has not shared exactly what is on these boards, but it’s clear that it represents an increasingly popular strategy to speed up deep-learning calculations: using an application-specific integrated circuit, or ASIC.

Deep-Learning Software Revenues (US $, Billions)

bar chart Revenues from deep-learning software should soon exceed $1 billion. Source: Tractica

Another tactic being pursued (primarily by Microsoft) is to use field-programmable gate arrays (FPGAs), which provide the benefit of being reconfigurable if the computing requirements change. The more common approach, though, has been to use graphics processing units, or GPUs, which can perform many mathematical operations in parallel. The foremost proponent of this approach is GPU maker Nvidia.

Indeed, advances in GPUs kick-started artificial neural networks back in 2009, when researchers at Stanford showed that such hardware made it possible to train deep neural networks in reasonable amounts of time [PDF].

“Everybody is doing deep learning today,” says William Dally, who leads the Concurrent VLSI Architecture group at Stanford and is also chief scientist for Nvidia. And for that, he says, perhaps not surprisingly given his position, “GPUs are close to being as good as you can get.”

Dally explains that there are three separate realms to consider. The first is what he calls “training in the data center.” He’s referring to the first step for any deep-learning system: adjusting perhaps many millions of connections between neurons so that the network can carry out its assigned task.

In building hardware for that, a company called Nervana Systems, which was recently acquired by Intel, has been leading the charge. According to Scott Leishman, a computer scientist at Nervana, the Nervana Engine, an ASIC deep-learning accelerator, will go into production in early to mid-2017. Leishman notes that another computationally intensive task—bitcoin mining—went from being run on CPUs to GPUs to FPGAs and, finally, on ASICs because of the gains in power efficiency from such customization. “I see the same thing happening for deep learning,” he says.

A second and quite distinct job for deep-learning hardware, explains Dally, is “inference at the data center.” The word inference here refers to the ongoing operation of cloud-based artificial neural networks that have previously been trained to carry out some job. Every day, Google’s neural networks are making an astronomical number of such inference calculations to categorize images, translate between languages, and recognize spoken words, for example. Although it’s hard to say for sure, Google’s Tensor Processing Unit is presumably tailored for performing such computations.

Google's Tensor Processing Unit Pedal to Metal: Google’s Tensor Processing Unit accelerates deep-learning calculations on the company’s servers. Photo: Google

Training and inference often take very different skill sets. Typically for training, the computer must be able to calculate with relatively high precision, often using 32-bit floating-point operations. For inference, precision can be sacrificed in favor of greater speed or less power consumption. “This is an active area of research,” says Leishman. “How low can you go?”

Although Dally declines to divulge Nvidia’s specific plans, he points out that the company’s GPUs have been evolving. Nvidia’s earlier Maxwell architecture could perform double- (64-bit) and single- (32-bit) precision operations, whereas its current Pascal architecture adds the capability to do 16-bit operations at twice the throughput and efficiency of its single-precision calculations. So it’s easy to imagine that Nvidia will eventually be releasing GPUs able to perform 8-bit operations, which could be ideal for inference calculations done in the cloud, where power efficiency is critical to keeping costs down.

Dally adds that “the final leg of the tripod for deep learning is inference in embedded devices,” such as smartphones, cameras, and tablets. For those applications, the key will be low-power ASICs. Over the coming year, deep-learning software will increasingly find its way into applications for smartphones, where it is already used, for example, to detect malware or translate text in images.

And the drone manufacturer DJI is already using something akin to a deep-learning ASIC in its Phantom 4 drone, which uses a special visual-processing chip made by California-based Movidius to recognize obstructions. (Movidius is yet another neural-network company recently acquired by Intel.) Qualcomm, meanwhile, built special circuitry into its Snapdragon 820 processors to help carry out deep-learning calculations.

Although there is plenty of incentive these days to design hardware to accelerate the operation of deep neural networks, there’s also a huge risk: If the state of the art shifts far enough, chips designed to run yesterday’s neural nets will be outdated by the time they are manufactured. “The algorithms are changing at an enormous rate,” says Dally. “Everybody who is building these things is trying to cover their bets.”

This article appears in the January 2017 print issue as “Deeper and Cheaper Machine Learning.”

The Conversation (0)

China Aims for a Permanent Moon Base in the 2030s

Lunar megaproject to be a stepping-stone to the solar system

6 min read
Mark Ralston/AFP/Getty Images

On 3 January 2019, the Chinese spacecraft Chang'e-4 descended toward the moon. Countless craters came into view as the lander approached the surface, the fractal nature of the footage providing no sense of altitude. Su Yan, responsible for data reception for the landing at Miyun ground station, in Beijing, was waiting—nervously and in silence with her team—for vital signals indicating that optical, laser, and microwave sensors had combined effectively with rocket engines for a soft landing. "When the [spectral signals were] clearly visible, everyone cheered enthusiastically. Years of hard work had paid off in the most sweet way," Su recalls.

Chang'e-4 had, with the help of a relay satellite out beyond the moon, made an unprecedented landing on the always-hidden lunar far side. China's space program, long trailing in the footsteps of the U.S. and Soviet (now Russian) programs, had registered an international first. The landing also prefigured grander Chinese lunar ambitions.

Keep Reading ↓ Show less

Air Quality: Easy to Measure, Tough to Fix

Wildfire season shows the limits of air purifiers

3 min read
Harry Campbell

Illustration of a phone with with a sensor on top. Harry Campbell

The summer of 2020 brought wildfire to Portland, Ore., as it did to so many other cities across the world. All outdoor activity in my neighborhood ceased for weeks, yet staying indoors didn't guarantee relief. The worst days left me woozy as my lone air purifier, whirring like a jet engine, failed to keep up.

Keep Reading ↓ Show less

How to Write Exceptionally Clear Requirements: 21 Tips

Avoid bad requirements with these 21 tips

1 min read

Systems Engineers face a major dilemma: More than 50% of project defects are caused by poorly written requirements. It's important to identify problematic language early on, before it develops into late-stage rework, cost-overruns, and recalls. Learn how to identify risks, errors and ambiguities in requirements before they cripple your project.

Trending Stories

The most-read stories on IEEE Spectrum right now