The August 2022 issue of IEEE Spectrum is here!

Close bar

Expect Deeper and Cheaper Machine Learning

Supercharged hardware will speed up deep learning in everything from tiny devices to massive data centers

4 min read
Illustration of a book opened over a laptop computer
Photo-illustration: Edmon de Haro

Illustration of a book opened over a laptop computerPhoto-illustration: Edmon de Haro

Top Tech 2017 logo

Last March, Google’s computers roundly beat the world-class Go champion Lee Sedol, marking a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand.

Unknown to the public at the time was that Google had an ace up its sleeve. You see, the computers Google used to defeat Sedol contained special-purpose hardware—a computer card Google calls its Tensor Processing Unit.

Norm Jouppi, a hardware engineer at Google, announced the existence of the Tensor Processing Unit two months after the Go match, explaining in a blog post that Google had been outfitting its data centers with these new accelerator cards for more than a year. Google has not shared exactly what is on these boards, but it’s clear that it represents an increasingly popular strategy to speed up deep-learning calculations: using an application-specific integrated circuit, or ASIC.

Deep-Learning Software Revenues (US $, Billions)

bar chartRevenues from deep-learning software should soon exceed $1 billion. Source: Tractica

Another tactic being pursued (primarily by Microsoft) is to use field-programmable gate arrays (FPGAs), which provide the benefit of being reconfigurable if the computing requirements change. The more common approach, though, has been to use graphics processing units, or GPUs, which can perform many mathematical operations in parallel. The foremost proponent of this approach is GPU maker Nvidia.

Indeed, advances in GPUs kick-started artificial neural networks back in 2009, when researchers at Stanford showed that such hardware made it possible to train deep neural networks in reasonable amounts of time [PDF].

“Everybody is doing deep learning today,” says William Dally, who leads the Concurrent VLSI Architecture group at Stanford and is also chief scientist for Nvidia. And for that, he says, perhaps not surprisingly given his position, “GPUs are close to being as good as you can get.”

Dally explains that there are three separate realms to consider. The first is what he calls “training in the data center.” He’s referring to the first step for any deep-learning system: adjusting perhaps many millions of connections between neurons so that the network can carry out its assigned task.

In building hardware for that, a company called Nervana Systems, which was recently acquired by Intel, has been leading the charge. According to Scott Leishman, a computer scientist at Nervana, the Nervana Engine, an ASIC deep-learning accelerator, will go into production in early to mid-2017. Leishman notes that another computationally intensive task—bitcoin mining—went from being run on CPUs to GPUs to FPGAs and, finally, on ASICs because of the gains in power efficiency from such customization. “I see the same thing happening for deep learning,” he says.

A second and quite distinct job for deep-learning hardware, explains Dally, is “inference at the data center.” The word inference here refers to the ongoing operation of cloud-based artificial neural networks that have previously been trained to carry out some job. Every day, Google’s neural networks are making an astronomical number of such inference calculations to categorize images, translate between languages, and recognize spoken words, for example. Although it’s hard to say for sure, Google’s Tensor Processing Unit is presumably tailored for performing such computations.

Google's Tensor Processing UnitPedal to Metal: Google’s Tensor Processing Unit accelerates deep-learning calculations on the company’s servers.Photo: Google

Training and inference often take very different skill sets. Typically for training, the computer must be able to calculate with relatively high precision, often using 32-bit floating-point operations. For inference, precision can be sacrificed in favor of greater speed or less power consumption. “This is an active area of research,” says Leishman. “How low can you go?”

Although Dally declines to divulge Nvidia’s specific plans, he points out that the company’s GPUs have been evolving. Nvidia’s earlier Maxwell architecture could perform double- (64-bit) and single- (32-bit) precision operations, whereas its current Pascal architecture adds the capability to do 16-bit operations at twice the throughput and efficiency of its single-precision calculations. So it’s easy to imagine that Nvidia will eventually be releasing GPUs able to perform 8-bit operations, which could be ideal for inference calculations done in the cloud, where power efficiency is critical to keeping costs down.

Dally adds that “the final leg of the tripod for deep learning is inference in embedded devices,” such as smartphones, cameras, and tablets. For those applications, the key will be low-power ASICs. Over the coming year, deep-learning software will increasingly find its way into applications for smartphones, where it is already used, for example, to detect malware or translate text in images.

And the drone manufacturer DJI is already using something akin to a deep-learning ASIC in its Phantom 4 drone, which uses a special visual-processing chip made by California-based Movidius to recognize obstructions. (Movidius is yet another neural-network company recently acquired by Intel.) Qualcomm, meanwhile, built special circuitry into its Snapdragon 820 processors to help carry out deep-learning calculations.

Although there is plenty of incentive these days to design hardware to accelerate the operation of deep neural networks, there’s also a huge risk: If the state of the art shifts far enough, chips designed to run yesterday’s neural nets will be outdated by the time they are manufactured. “The algorithms are changing at an enormous rate,” says Dally. “Everybody who is building these things is trying to cover their bets.”

This article appears in the January 2017 print issue as “Deeper and Cheaper Machine Learning.”

Keep reading...Show less

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions

Artificial Synapses 10,000x Faster Than Real Thing

New protonic programmable resistors may help speed learning in deep neural networks

3 min read
Conceptual illustration shows a brain shape made of circuits on a multilayered chip structure.
Ella Maru Studio and Murat Onen

New artificial versions of the neurons and synapses in the human brain are up to 1,000 times smaller than neurons and at least 10,000 times faster than biological synapses, a study now finds.

These new devices may help improve the speed at which the increasingly common and powerful artificial intelligence systems known as deep neural networks learn, researchers say.

Keep Reading ↓Show less

Amazon to Acquire iRobot F​or $1.7 Billion

The deal will give the e-retail behemoth even more access to our homes

4 min read
A photo of an iRobot Roomba with an Amazon logo digitally added to it
Photo-illustration: iStockphoto/Amazon/IEEE Spectrum

This morning, Amazon and iRobot announced “a definitive merger agreement under which Amazon will acquire iRobot” for US $1.7 billion. The announcement was a surprise, to put it mildly, and we’ve barely had a chance to digest the news. But taking a look at what’s already known can still yield initial (if incomplete) answers as to why Amazon and iRobot want to team up—and whether the merger seems like a good idea.

Keep Reading ↓Show less

Harnessing the Power of Innovation Intelligence

Through case studies and data visualizations, this webinar will show you how to leverage IP and scientific data analytics to identify emerging business opportunities

1 min read

Business and R&D leaders have to make consequential strategic decisions every day in a global marketplace that continues to get more interconnected and complex. Luckily, the job can be more manageable and efficient by leveraging IP and scientific data analytics. Register for this free webinar now!

Join us for the webinar, Harnessing the power of innovation intelligence, to hear Clarivate experts discuss how analyzing IP data, together with scientific content and industry-specific data, can provide organization-wide situational awareness and reveal valuable business insights.

Keep Reading ↓Show less