IBM’s New Do-It-All Deep-Learning Chip

The field of deep learning is still in flux, but some things have started to settle out. In particular, experts recognize that neural nets can get a lot of computation done with little energy if a chip approximates an answer using low-precision math. That’s especially useful in mobile and other power-constrained devices. But some tasks, especially training a neural net to do something, still need precision. IBM recently revealed its newest solution, still a prototype, at the IEEE VLSI Symposia: a chip that does both equally well.

The disconnect between the needs of training a neural net and having that net execute its function, called inference, has been one of the big challenges for those designing chips that accelerate AI functions. IBM’s new AI accelerator chip is capable of what the company calls scaled precision. That is, it can do both training and inference at 32, 16, or even 1 or 2 bits.

“The most advanced precision that you can do for training is 16 bits, and the most advanced you can do for inference is 2 bits,” explains Kailash Gopalakrishnan, the distinguished member of the technical staff at IBM’s Yorktown Heights research center who led the effort. “This chip potentially covers the best of training known today and the best of inference known today.”

The chip’s ability to do all of this stems from two innovations that are both aimed at the same outcome—keeping all the processor components fed with data and working.

“One of the challenges that you have with traditional [chip] architectures when it comes to deep learning is that the utilization is typically very low,” says Gopalakrishnan. That is, even though a chip might be capable of a very high peak performance, typically only 20 to 30 percent of its resources can really be brought to bear on a problem. IBM aimed for 90 percent, for all tasks, all the time.

Low utilization is usually due to bottlenecks in the flow of data around the chip. To break through these information infarctions, Gopalakrishnan’s team came up with a “customized” data flow system. The data flow system is a network scheme that speeds the movement of data from one processing engine to the next. It is customized according to whether it’s handling learning or inference and for the different scales of precision.

The best results come from training a network at a similar precision to how it will ultimately be executed.

The second innovation was the use of a specially designed “scratch pad” form of on-chip memory instead of the traditional cache memory found on a CPU or GPU. Caches are built to obey certain rules that make sense for general computing but cause delays in deep learning. For example, there are certain situations where a cache would push a chunk of data out to the computer’s main memory (evict it), but if that data’s needed as part of the neural network’s inferencing or learning process, the system will then have to wait until it can be retrieved from main memory.

A scratch pad doesn’t follow the same rules. Instead, it’s built to keep data flowing through the chip’s processing engines, making sure the data is at the right spot at just the right time. To get to 90 percent utilization, IBM had to design the scratch pad with a huge read/write bandwidth, 192 gigabytes per second.

The resulting chip can perform all three of today’s main flavors of deep learning AI: convolutional neural networks (CNN), multilayer perceptrons (MLP), and long short-term memory (LSTM). Together these techniques dominate speech, vision, and natural language processing, explains Gopalakrishnan. At 16-bit—typical for training—precision, IBM’s new chip cranks through 1.5 trillion floating point operations per second; at 2-bit precision—best for inference—that leaps to 12 trillion operations per second.

Gopalakrishnan points out that because the chip is made using an advanced silicon CMOS manufacturing process (GlobalFoundries’ 14-nanometer process), all those operations per second are packed into a pretty small area. For inferencing a CNN, the chip can perform an average of 1.33 trillion operations per second per square millimeter. That figure is important “because in a lot of applications you are cost constrained by size,” he says.

The new architecture also proves something IBM researchers have been exploring for a few years: Inference at really low precision doesn’t work well if the neural nets are trained at much higher precision. “As you go below 8 bits, training and inference start to directly impact each other,” says Gopalakrishnan. A neural net trained at 16 bits but deployed as a 1-bit system will result in unacceptably large errors, he says. So, the best results come from training a network at a similar precision to how it will ultimately be executed.

No word on when this technology might be commercialized in Watson or another form, but Gopalakrishnan’s boss, Mukesh Khare, IBM’s vice president of semiconductor research, says to expectit to evolve and improve. “This is the tip of the iceberg,” he says. “We have many more innovations in the pipeline.”

Editor’s note: This story was updated on 2 July 2018.

This article appears in the August 2018 print issue as “IBM’s New Do-It-All AI Chip.”

From Your Site Articles

How IBM’s Deep Blue Beat World Champion Chess Player Garry Kasparov - IEEE Spectrum ›

processors robot ai embedded systems machine learning deep learning ibm scalable precision ai convolutional neural networks

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

IBM’s New Do-It-All Deep-Learning Chip

IBM's new chip is designed to do both high-precision learning and low-precision inference across the three main flavors of deep learning

Related Stories

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Intel Unveils Big Processor Architecture Changes

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes:

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

IBM’s New Do-It-All Deep-Learning Chip

IBM's new chip is designed to do both high-precision learning and low-precision inference across the three main flavors of deep learning

Related Stories

Deep Learning Gets a Boost From New Reconfigurable Processor

Meet Snitch: the Small and Agile RISC-V Processor

Intel Unveils Big Processor Architecture Changes

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes: