“Processors are overdesigned for most applications,” says University of Illinois electrical and computer engineering professor Rakesh Kumar. It’s a well-known and necessary truth: In order to have programmability and flexibility, there’s simply going to be more stuff on a processor than any one application will use. That’s especially true of the type of ultralow power microcontrollers that drive the newest embedded computing platforms such as wearables and Internet of Things sensors. These are often running one fairly simple application and nothing else (not even an operating system), meaning that a large fraction of the circuits on a chip never, ever see a single bit of data.
Kumar, University of Minnesota assistant professor John Sartori (formerly a student of Kumar’s), and their students decided to do something about all that waste. Their solution is a method that starts by looking at the design of a general-purpose microcontroller. They identify which individual logic gates are never engaged for the application it’s going to run, and strip away all the excess gates. The result is what Kumar calls a “bespoke processor.” It’s a physically smaller, less complex version of the original microcontroller that’s designed to perform only the application needed.
Kumar and Sartori will be detailing the bespoke processor project at the 44th International Symposium on Computer Architecture, in Toronto next week.
“Our approach was to figure out all the hardware that an application is guaranteed not to use irrespective of the input,” says Kumar. What’s left is “a union, or superset, of all possible paths that data can take. Then we take away the hardware that’s not touched.”
Starting with an openMSP430 microcontroller, they produced bespoke designs meant to perform applications such as the fast Fourier transform, autocorrelation, and interpolation filtering with fewer than half of the logic gates in the original microcontroller design. In fact, none of the 15 common microcontroller apps they studied needed more than 60 percent of the gates. On average, the resulting chips would be 62 percent smaller and consume 50 percent less power. By exploiting the timing savings from signals traveling a shorter distance, the average power savings jumps to 65 percent.
“It’s surprising,” Sartori says. “Most people think that in such a small, simple processor pretty much everything gets used all the time; but for a given application, there’s actually a lot of logic that can be completely eliminated, and the software still works perfectly.”
An analysis of the gates not used for two applications—intFilt and Scrambled intFilt—on an openMSP430 microcontroller. Grey dots are gates not used by either application. Red dots are gates unsused only by that application. Illustration: University of Illinois/ACM
The method also works if you want the processor to perform two or more applications, and it can even handle an operating system plus application. When run by itself, the real-time OS they tested, FreeRTOS, left 57 percent of gates completely untouched. Though no pairing of FreeRTOS with any of the 15 apps left fewer than 27 percent of the gates unused, Kumar points out that these applications typically run “bare metal”—no operating system needed.
Why not just order up an ASIC (application specific integrated circuit)? In a word: cost. These embedded microcontrollers are used for such low-volume low-profit-margin purposes that it would cost too much to do the ground-up design and testing needed for an ASIC, says Kumar. By starting with a standard microcontroller design, the process is simplified and cheaper.
It’s like “a black box,” says Kumar. “Input the app, and it outputs the processor design.”
This post was updated on 16 June to add comment from John Sartori.