Expect Deeper and Cheaper Machine Learning

Supercharged hardware will speed up deep learning in everything from tiny devices to massive data centers

4 min read
Illustration of a book opened over a laptop computer
Photo-illustration: Edmon de Haro

Illustration of a book opened over a laptop computer Photo-illustration: Edmon de Haro

Last March, Google’s computers roundly beat the world-class Go champion Lee Sedol, marking a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand.

Unknown to the public at the time was that Google had an ace up its sleeve. You see, the computers Google used to defeat Sedol contained special-purpose hardware—a computer card Google calls its Tensor Processing Unit.

Norm Jouppi, a hardware engineer at Google, announced the existence of the Tensor Processing Unit two months after the Go match, explaining in a blog post that Google had been outfitting its data centers with these new accelerator cards for more than a year. Google has not shared exactly what is on these boards, but it’s clear that it represents an increasingly popular strategy to speed up deep-learning calculations: using an application-specific integrated circuit, or ASIC.

Deep-Learning Software Revenues (US $, Billions)

bar chart Revenues from deep-learning software should soon exceed $1 billion. Source: Tractica

Another tactic being pursued (primarily by Microsoft) is to use field-programmable gate arrays (FPGAs), which provide the benefit of being reconfigurable if the computing requirements change. The more common approach, though, has been to use graphics processing units, or GPUs, which can perform many mathematical operations in parallel. The foremost proponent of this approach is GPU maker Nvidia.

Indeed, advances in GPUs kick-started artificial neural networks back in 2009, when researchers at Stanford showed that such hardware made it possible to train deep neural networks in reasonable amounts of time [PDF].

“Everybody is doing deep learning today,” says William Dally, who leads the Concurrent VLSI Architecture group at Stanford and is also chief scientist for Nvidia. And for that, he says, perhaps not surprisingly given his position, “GPUs are close to being as good as you can get.”

Dally explains that there are three separate realms to consider. The first is what he calls “training in the data center.” He’s referring to the first step for any deep-learning system: adjusting perhaps many millions of connections between neurons so that the network can carry out its assigned task.

In building hardware for that, a company called Nervana Systems, which was recently acquired by Intel, has been leading the charge. According to Scott Leishman, a computer scientist at Nervana, the Nervana Engine, an ASIC deep-learning accelerator, will go into production in early to mid-2017. Leishman notes that another computationally intensive task—bitcoin mining—went from being run on CPUs to GPUs to FPGAs and, finally, on ASICs because of the gains in power efficiency from such customization. “I see the same thing happening for deep learning,” he says.

A second and quite distinct job for deep-learning hardware, explains Dally, is “inference at the data center.” The word inference here refers to the ongoing operation of cloud-based artificial neural networks that have previously been trained to carry out some job. Every day, Google’s neural networks are making an astronomical number of such inference calculations to categorize images, translate between languages, and recognize spoken words, for example. Although it’s hard to say for sure, Google’s Tensor Processing Unit is presumably tailored for performing such computations.

Google's Tensor Processing Unit Pedal to Metal: Google’s Tensor Processing Unit accelerates deep-learning calculations on the company’s servers. Photo: Google

Training and inference often take very different skill sets. Typically for training, the computer must be able to calculate with relatively high precision, often using 32-bit floating-point operations. For inference, precision can be sacrificed in favor of greater speed or less power consumption. “This is an active area of research,” says Leishman. “How low can you go?”

Although Dally declines to divulge Nvidia’s specific plans, he points out that the company’s GPUs have been evolving. Nvidia’s earlier Maxwell architecture could perform double- (64-bit) and single- (32-bit) precision operations, whereas its current Pascal architecture adds the capability to do 16-bit operations at twice the throughput and efficiency of its single-precision calculations. So it’s easy to imagine that Nvidia will eventually be releasing GPUs able to perform 8-bit operations, which could be ideal for inference calculations done in the cloud, where power efficiency is critical to keeping costs down.

Dally adds that “the final leg of the tripod for deep learning is inference in embedded devices,” such as smartphones, cameras, and tablets. For those applications, the key will be low-power ASICs. Over the coming year, deep-learning software will increasingly find its way into applications for smartphones, where it is already used, for example, to detect malware or translate text in images.

And the drone manufacturer DJI is already using something akin to a deep-learning ASIC in its Phantom 4 drone, which uses a special visual-processing chip made by California-based Movidius to recognize obstructions. (Movidius is yet another neural-network company recently acquired by Intel.) Qualcomm, meanwhile, built special circuitry into its Snapdragon 820 processors to help carry out deep-learning calculations.

Although there is plenty of incentive these days to design hardware to accelerate the operation of deep neural networks, there’s also a huge risk: If the state of the art shifts far enough, chips designed to run yesterday’s neural nets will be outdated by the time they are manufactured. “The algorithms are changing at an enormous rate,” says Dally. “Everybody who is building these things is trying to cover their bets.”

This article appears in the January 2017 print issue as “Deeper and Cheaper Machine Learning.”

The Conversation (0)

Q&A With Co-Creator of the 6502 Processor

Bill Mensch on the microprocessor that powered the Atari 2600 and Commodore 64

5 min read
Bill Mensch

Few people have seen their handiwork influence the world more than Bill Mensch. He helped create the legendary 8-bit 6502 microprocessor, launched in 1975, which was the heart of groundbreaking systems including the Atari 2600, Apple II, and Commodore 64. Mensch also created the VIA 65C22 input/output chip—noted for its rich features and which was crucial to the 6502's overall popularity—and the second-generation 65C816, a 16-bit processor that powered machines such as the Apple IIGS, and the Super Nintendo console.

Many of the 65x series of chips are still in production. The processors and their variants are used as microcontrollers in commercial products, and they remain popular among hobbyists who build home-brewed computers. The surge of interest in retrocomputing has led to folks once again swapping tips on how to write polished games using the 6502 assembly code, with new titles being released for the Atari, BBC Micro, and other machines.

Keep Reading ↓ Show less

Spot’s 3.0 Update Adds Increased Autonomy, New Door Tricks

Boston Dynamics' Spot can now handle push-bar doors and dynamically replan in complex environments

5 min read
Boston Dynamics

While Boston Dynamics' Atlas humanoid spends its time learning how to dance and do parkour, the company's Spot quadruped is quietly getting much better at doing useful, valuable tasks in commercial environments. Solving tasks like dynamic path planning and door manipulation in a way that's robust enough that someone can buy your robot and not regret it is, I would argue, just as difficult (if not more difficult) as getting a robot to do a backflip.

With a short blog post today, Boston Dynamics is announcing Spot Release 3.0, representing more than a year of software improvements over Release 2.0 that we covered back in May of 2020. The highlights of Release 3.0 include autonomous dynamic replanning, cloud integration, some clever camera tricks, and a new ability to handle push-bar doors, and earlier today, we spoke with Spot Chief Engineer at Boston Dynamics Zachary Jackowski to learn more about what Spot's been up to.

Keep Reading ↓ Show less

How to Write Exceptionally Clear Requirements: 21 Tips

Avoid bad requirements with these 21 tips

1 min read

Systems Engineers face a major dilemma: More than 50% of project defects are caused by poorly written requirements. It's important to identify problematic language early on, before it develops into late-stage rework, cost-overruns, and recalls. Learn how to identify risks, errors and ambiguities in requirements before they cripple your project.

Trending Stories

The most-read stories on IEEE Spectrum right now