First Programmable Memristor Computer

Michigan team builds memristors atop standard CMOS logic to demo a system that can do a variety of edge computing AI tasks

3 min read
Closeup image of the Memristor computer
Photo: Robert Coelius/Michigan Engineering Communications & Marketing

Hoping to speed AI and neuromorphic computing and cut down on power consumption, startups, scientists, and established chip companies have all been looking to do more computing in memory rather than in a processor’s computing core. Memristors and other nonvolatile memory seem to lend themselves to the task particularly well. However, most demonstrations of in-memory computing have been in standalone accelerator chips that either are built for a particular type of AI problem or that need the off-chip resources of a separate processor in order to operate. University of Michigan engineers are claiming the first memristor-based programmable computer for AI that can work on all its own.

“Memory is really the bottleneck,” says University of Michigan professor Wei Lu. “Machine learning models are getting larger and larger, and we don’t have enough on-chip memory to store the weights.” Going off-chip for data, to DRAM, say, can take 100 times as much computing time and energy. Even if you do have everything you need stored in on-chip memory, moving it back and forth to the computing core also takes too much time and energy, he says. “Instead, you do the computing in the memory.”

His lab has been working with memristors (also called resistive RAM, or RRAM), which store data as resistance, for more than a decade and has demonstrated the mechanics of their potential to efficiently perform AI computations such as the multiply-and-accumulate operations at the heart of deep learning. Arrays of memristors can do these tasks efficiently because they become analog computations instead of digital.

The new chip combines an array of 5,832 memristors with an OpenRISC processor. 486 specially-designed digital-to-analog converters, 162 analog-to-digital converters, and two mixed-signal interfaces act as translators between the memristors’ analog computations and the main processor.

“All the functions are implemented on chip,” says Lu, an IEEE Fellow. “To show the promise, you can’t just build the individual pieces.”

At its maximum frequency, the chip consumed just over 300 milliwatts while performing 188 billion operations per second per watt (GOPS/W). That doesn’t compare well to, say, Nvidia’s latest research AI accelerator chip at 9.09 trillion operations per second per watt (TOPS/W), although without considering the energy cost and latency of transferring data from DRAM. But Lu points out that the CMOS portion was built using the two-decade-old 180-nanometer semiconductor manufacturing process. Moving it to a newer process, such as to 2008-era 40-nanometer tech, would drop power consumption to 42 mW and boost performance to 1.37 TOPS/W without needing to transfer data from DRAM. Nvidia’s chip was made using a 16-nanometer process that debuted in 2014.

Image of Wei Lu with first author Seung Hwan Lee, who is holding the memristor array.Wei Lu of the University of Michigan and collaborator Seung Hwan Lee, an electrical engineering PhD student, built a programmable memristor array, which Lee holds here.Photo: Robert Coelius/Michigan Engineering Communications & Marketing

Lu’s team put the chip through three tests to prove its programmability and ability to handle a wide variety of machine learning tasks. The most straightforward one is called a perceptron, which is used to classify information. For that task, the memristor computer had to recognize Greek letters even when the image of them was noisy.

The second, and more difficult task was a problem of sparse coding. In sparse coding, you are trying to build the most efficient network of artificial neurons that will get the job done. That means that as the network learns its task, it has neurons compete with each other for a place in the network. The losers are excised, leaving a more brain-like and efficient neural network with only the connections absolutely needed. Lu demonstrated memristor-based sparse coding in 2017 on a smaller array.

The final task was a dual layer neural network capable of what’s called unsupervised learning. Rather than being presented with a set of labelled images to learn from, the chip was given a bunch of mammography test scores. The neural network first worked out what the important features of the combination of scores were and then distinguished malignant from benign tumors with 94.6 percent accuracy.

The next version of the chip, which Lu says will be done next year, will have both faster, more-efficient CMOS and multiple memristor arrays. “We will use multiple arrays to show you can tie them together to form larger networks,” he says.

Lu has formed a startup called MemryX with the aim of commercializing the chip. His previous RRAM startup, Crossbar, is also chasing the AI space. Last year Crossbar inked a deal with aerospace chipmaker Microsemi and demoed a chip that did face recognition and read license plates.

The Conversation (0)

3D-Stacked CMOS Takes Moore’s Law to New Heights

When transistors can’t get any smaller, the only direction is up

10 min read
An image of stacked squares with yellow flat bars through them.
Emily Cooper

Perhaps the most far-reaching technological achievement over the last 50 years has been the steady march toward ever smaller transistors, fitting them more tightly together, and reducing their power consumption. And yet, ever since the two of us started our careers at Intel more than 20 years ago, we’ve been hearing the alarms that the descent into the infinitesimal was about to end. Yet year after year, brilliant new innovations continue to propel the semiconductor industry further.

Along this journey, we engineers had to change the transistor’s architecture as we continued to scale down area and power consumption while boosting performance. The “planar” transistor designs that took us through the last half of the 20th century gave way to 3D fin-shaped devices by the first half of the 2010s. Now, these too have an end date in sight, with a new gate-all-around (GAA) structure rolling into production soon. But we have to look even further ahead because our ability to scale down even this new transistor architecture, which we call RibbonFET, has its limits.

Keep Reading ↓Show less