New Type of DRAM Could Accelerate AI

Capacitorless DRAM using oxide semiconductors could be built in 3D layers above a processor’s silicon

5 min read

Samuel K. Moore is IEEE Spectrum’s semiconductor editor.

SEM of transistors in the capacitorless include a tungsten-doped indium oxide [orange] semiconductor, palladium top and bottom gates [yellow], nickel source and drain electrodes [green] and hafnium oxide dielectrics [blue].
The transistors in the capacitorless DRAM developed by U.S.-based researchers includes a tungsten-doped indium oxide [orange] semiconductor, palladium top and bottom gates [yellow], nickel source and drain electrodes [green] and hafnium oxide dielectrics [blue].
Image: University of Notre Dam

One of the biggest problems in computing today is the “memory wall”—the difference between processing time and the time it takes to shuttle data over to the processor from separate DRAM memory chips. The increasingly popularity of AI applications has only made that problem more pronounced, because the huge networks that find faces, understand speech, and recommend consumer goods rarely fit in a processor’s on-board memory.

In December at IEEE International Electron Device Meeting (IEDM), separate research groups in the United States and in Belgium think a new kind of DRAM might be the solution. The new DRAM, made from oxide semiconductors and built in the layers above the processor, holds bits hundreds or thousands of times longer than commercial DRAM and could provide huge area and energy savings when running large neural nets, they say.

The DRAM memory cells in your computer are made from a single transistor and a single capacitor each, a so-called 1T1C design. To write a bit to the cell, the transistor is turned on and charge is pushed into (1) or removed from (0) the capacitor. To read from it, the charge (if there is any) is withdrawn and measured. This system is superfast, cheap, and consumes little power, but it has some downsides. For one, reading the bit drains the capacitor, so reading means writing the bit back to memory. What’s more, even if you don’t read the bit, charge will eventually leak out of the capacitor through the transistor. So all the cells need to be periodically refreshed just to keep the data. In modern DRAM chips, that’s done every 64 milliseconds.

Embedding DRAM in a processor chip is done commercially, but it has its limits. “The challenge with the monolithic 1T1C design has always been the difficulty of building the capacitor as well as fabricating a transistor with ultra-low leakage,” using a manufacturing process meant for logic transistors, says Arijit Raychowdhury, a professor of electrical and computer engineering at Georgia Tech, who worked with researchers at Notre Dame University and the Rochester Institute of Technology on a new embedded DRAM. Good capacitors are difficult to make in manufacturing processes built for logic circuits.

The new embedded DRAM is instead made from two transistors only, no capacitor (2T0C). This works because a transistor’s gate is a natural, though small, capacitor. So the charge representing the bit can be stored there. This design has some key benefits, especially for AI.

diagramUnlike ordinary DRAM, which is made of a transistor and a capacitor, 2T0C embedded DRAM is made up of two transistors. The bit is stored in the capacitance of the right-hand transistor, and it's placed there by the left-hand device. Charge on the right-hand device's gate means current can flow through it.Therefore, separate transistors control reading and writing.Image: University of Notre Dame

One is that writing and reading involve separate devices, explains Raychowdhury. So you can read from a 2T0C DRAM cell without destroying the data and having to rewrite it. All you have to do is see if current flows through the transistor whose gate is holding the charge. If the charge is there, it will turn the transistor on. Current flows. If there’s no charge there, current is stopped.

Easy reading is especially important for AI, because neural networks tend to read at least three times for every write, Jorge Gomez, a graduate student at Notre Dame in the laboratory of Suman Datta, told IEDM attendees.

But a 2T0C arrangement doesn’t work well with silicon logic transistors, says Raychowdhury. Any bit would drain away immediately because the transistor gate capacitance is too low and the leakage through the transistors is too high. So researchers are turning to devices made with amorphous oxide semiconductors, like those used to control pixels in some displays.

These have several admirable qualities. Notably, they can drive a lot of current, which makes writing quicker, and when they are off they leak very little charge, which makes bits last longer. The U.S.-based team used indium oxide doped to about 1 percent with tungsten as their semiconductor, IWO for short. The device’s on currents “are some of the best reported for oxide transistors,” says Raychowdhury. “It gives you enough read/write speed for logic operations. At same time off currents are really small…two to three orders smaller than the best you get with silicon.” In fact, the team had to build an extra-large version of the device in order to get any measurement of current leakage at all.

Just as important is that oxides like these can be processed at (relatively) low temperatures. That means devices made from them can be constructed in the layers of interconnects above a processor’s silicon without damaging the silicon devices below. Building memory cells there gives a direct, high-bandwidth path for data to get to processing elements on the silicon, effectively knocking down the memory wall.

In simulations of three common neural networks, the team compared one-, four-, and eight-layer versions of their technology to 22-nanometer 1T1C embedded DRAM, a technology used in IBM Power8 processors. Because controlling the 2T0C embedded DRAM takes up a certain amount of logic on the processor, using only a single layer of the new memory doesn’t actually give you an advantage in terms of the chip area needed for all the neural net data. But the 4-layer 2T0C DRAM cut the chip area needed for embedded memory by about 3.5 times, and for 8-layers it was a 7.3-fold reduction.

Similarly, the 2T0C embedded DRAM showed a performance advantage over 1T1C embedded DRAM when there was more than one layer of it. For example, with one square millimeter of either four- or eight layers of embedded DRAM, the ResNet-110 neural net never once had to go off-chip for data. That’s potentially a huge savings in time and energy over the 1T1C design, which required off-chip data about 70 percent of the time.  

Researchers at Belgium’s Imec unveiled a similar 2T0C embedded scheme at IEDM using indium gallium zinc oxide as a semiconductor. Imec senior scientist Attilio Belmonte pointed out that the IGZO must be annealed in the presence of oxygen in order to heal defects in the material caused by oxygen vacancies. This has the effect of reducing the number of free electrons in the IGZO that can contribute to the flow of current, but without it, the devices don’t act like switches.

The need for this “oxygen passivation” has several knock-on effects concerning the design of the IGZO DRAM devices—including the choice and position of the dielectrics involved. The optimized device Imec developed had the IGZO laying atop a layer of silicon dioxide and was topped with aluminum oxide. That combination worked especially well to control leakage that drains the bit away. The 2T0C memory cell had a mean retention time of 200 seconds, and 25 percent of the cells held their bits for more than 400 seconds, thousands of times longer than ordinary DRAM cells achieve. In follow-up research, the Imec team is hoping to using a different phase of IGZO to push retention time to more than 100 hours, he told engineers at IEDM.

That kind of retention time puts the device in the realm of non-volatile memories, such as resistive RAM and magnetic RAM. Many groups are focused on using embedded RRAM and MRAM to speed AI. But Raychowdhury says 2T0C embedded DRAM has an advantage over them. Those two require a lot of current to write, and for now that current has to come from transistors in the processor’s silicon, so there is less space saving to be had. What’s worse, they’re bound to be slower to switch than DRAM. “Anything based on charge is typically going to be faster, at least for the write process,” he says. Proof of how much faster will have to wait for construction of full arrays of embedded 2T0C DRAM on processors. But that’s coming, he says.

The Conversation (0)