First Programmable Memristor Computer

Michigan team builds memristors atop standard CMOS logic to demo a system that can do a variety of edge computing AI tasks

3 min read
Closeup image of the Memristor computer
Photo: Robert Coelius/Michigan Engineering Communications & Marketing

Hoping to speed AI and neuromorphic computing and cut down on power consumption, startups, scientists, and established chip companies have all been looking to do more computing in memory rather than in a processor’s computing core. Memristors and other nonvolatile memory seem to lend themselves to the task particularly well. However, most demonstrations of in-memory computing have been in standalone accelerator chips that either are built for a particular type of AI problem or that need the off-chip resources of a separate processor in order to operate. University of Michigan engineers are claiming the first memristor-based programmable computer for AI that can work on all its own.

“Memory is really the bottleneck,” says University of Michigan professor Wei Lu. “Machine learning models are getting larger and larger, and we don’t have enough on-chip memory to store the weights.” Going off-chip for data, to DRAM, say, can take 100 times as much computing time and energy. Even if you do have everything you need stored in on-chip memory, moving it back and forth to the computing core also takes too much time and energy, he says. “Instead, you do the computing in the memory.”

His lab has been working with memristors (also called resistive RAM, or RRAM), which store data as resistance, for more than a decade and has demonstrated the mechanics of their potential to efficiently perform AI computations such as the multiply-and-accumulate operations at the heart of deep learning. Arrays of memristors can do these tasks efficiently because they become analog computations instead of digital.

The new chip combines an array of 5,832 memristors with an OpenRISC processor. 486 specially-designed digital-to-analog converters, 162 analog-to-digital converters, and two mixed-signal interfaces act as translators between the memristors’ analog computations and the main processor.

“All the functions are implemented on chip,” says Lu, an IEEE Fellow. “To show the promise, you can’t just build the individual pieces.”

At its maximum frequency, the chip consumed just over 300 milliwatts while performing 188 billion operations per second per watt (GOPS/W). That doesn’t compare well to, say, Nvidia’s latest research AI accelerator chip at 9.09 trillion operations per second per watt (TOPS/W), although without considering the energy cost and latency of transferring data from DRAM. But Lu points out that the CMOS portion was built using the two-decade-old 180-nanometer semiconductor manufacturing process. Moving it to a newer process, such as to 2008-era 40-nanometer tech, would drop power consumption to 42 mW and boost performance to 1.37 TOPS/W without needing to transfer data from DRAM. Nvidia’s chip was made using a 16-nanometer process that debuted in 2014.

Image of Wei Lu with first author Seung Hwan Lee, who is holding the memristor array. Wei Lu of the University of Michigan and collaborator Seung Hwan Lee, an electrical engineering PhD student, built a programmable memristor array, which Lee holds here. Photo: Robert Coelius/Michigan Engineering Communications & Marketing

Lu’s team put the chip through three tests to prove its programmability and ability to handle a wide variety of machine learning tasks. The most straightforward one is called a perceptron, which is used to classify information. For that task, the memristor computer had to recognize Greek letters even when the image of them was noisy.

The second, and more difficult task was a problem of sparse coding. In sparse coding, you are trying to build the most efficient network of artificial neurons that will get the job done. That means that as the network learns its task, it has neurons compete with each other for a place in the network. The losers are excised, leaving a more brain-like and efficient neural network with only the connections absolutely needed. Lu demonstrated memristor-based sparse coding in 2017 on a smaller array.

The final task was a dual layer neural network capable of what’s called unsupervised learning. Rather than being presented with a set of labelled images to learn from, the chip was given a bunch of mammography test scores. The neural network first worked out what the important features of the combination of scores were and then distinguished malignant from benign tumors with 94.6 percent accuracy.

The next version of the chip, which Lu says will be done next year, will have both faster, more-efficient CMOS and multiple memristor arrays. “We will use multiple arrays to show you can tie them together to form larger networks,” he says.

Lu has formed a startup called MemryX with the aim of commercializing the chip. His previous RRAM startup, Crossbar, is also chasing the AI space. Last year Crossbar inked a deal with aerospace chipmaker Microsemi and demoed a chip that did face recognition and read license plates.

The Conversation (0)

3 Ways 3D Chip Tech Is Upending Computing

AMD, Graphcore, and Intel show why the industry’s leading edge is going vertical

6 min read
AMD 3D V-Cache

AMD’s 3D V-Cache

tech attaches a

64-megabyte SRAM

cache and two

blank structural

dies to the Zen 3

compute chiplet.

AMD
DarkBlue1

A crop of high-performance processors is showing that the new direction for continuing Moore’s Law is all about up. Each generation of processor needs to perform better than the last, and, at its most basic, that means integrating more logic onto the silicon. But there are two problems: One is that our ability to shrink transistors and the logic and memory blocks they make up is slowing down. The other is that chips have reached their size limits. Photolithography tools can pattern only an area of about 850 square millimeters, which is about the size of a top-of-the-line Nvidia GPU.

For a few years now, developers of systems-on-chips have begun to break up their ever-larger designs into smaller chiplets and link them together inside the same package to effectively increase the silicon area, among other advantages. In CPUs, these links have mostly been so-called 2.5D, where the chiplets are set beside each other and connected using short, dense interconnects. Momentum for this type of integration will likely only grow now that most of the major manufacturers have agreed on a 2.5D chiplet-to-chiplet communications standard.

Keep Reading ↓ Show less
{"imageShortcodeIds":["29525992"]}