We could save energy in everything from smartphones to supercomputers by letting them make mistakes
Voltaire once wrote, “The best is the enemy of the good.”
That’s just one of the many ways of saying that perfectionism is not always a virtue. Yet computers today are relentless perfectionists, because even rare flaws in their output can be expensive, if not disastrous. A bug in Intel’s P5 Pentium chip, for example, cost the company almost half a billion U.S. dollars, even though it caused only slightly inaccurate answers in about one in 9 billion computations.
The extraordinary reliability of computers makes them useful for many tasks, especially ones that we error-prone humans would never be able to do well. But this wonderful quality comes at a price—energy. You see, to ensure that errors are so rare that you can safely assume they never happen, computers consume gobs of energy.
That near-perfect correctness was a foregone conclusion by 1971, when the first commercial microprocessor, the Intel 4004, hit the market. For the next three decades, the main goal of microprocessor designers was to maintain that attribute while stuffing ever more transistors onto each chip; energy efficiency was, if anything, an after thought. Today, however, energy use is front and center in chip designers’ minds, for several reasons.
First off, the processors that go into the mobile computers we’ve all grown so fond of, like smartphones and tablets, must be energy efficient to conserve battery life. Those found in supercomputers and data centers need to be similarly thrifty with power, because the electricity to run these facilities often costs their owners millions of dollars a year. And in contrast to the situation in 1971, designers can no longer count on advances in semiconductor technology to bring drastically better per-transistor energy efficiency year after year.
Indeed, they now face the “dark silicon” problem: While Moore’s Law remains intact, consistently delivering more transistors per chip with each new generation of manufacturing, microprocessors can’t use all these additional transistors at the same time. Only a portion of them can be powered up before the chip becomes impossible to cool. So chip designers must arrange things to leave much of the microprocessor unpowered, or “dark,” at any given moment—the rolling blackouts of the silicon world.
In this energy-constrained era, computer scientists of all stripes need to reexamine each of the things computers devote power to—including error-free operation. Absolute correctness was a great attribute when energy didn’t matter. But now it’s time to embrace the occasional slipup.
Compromising on correctness may seem to be a dangerous strategy. After all, when computers make mistakes, the results can be catastrophic. So if you let them make more mistakes, wouldn’t they run the risk of becoming useless? Not really. The trick is controlling when the goofs can happen. Sometimes they won’t pose a problem, because many of the things we use computers for today don’t demand strict correctness.
Let’s say you’re on a long flight, watching a movie on your tablet computer. When the original movie file was compressed, the software that encoded it threw away unimportant details from each frame to produce a file that’s a lot smaller. If the software decoder running on your tablet messes up a few pixels while playing the movie back, you probably won’t object—especially if it means more battery life remains by the time the credits roll.
This situation is not unique to media players. For many kinds of software—speech recognition, augmented reality, machine learning, big-data analytics, and game graphics, to give a few examples—perfection in the output isn’t the goal. Indeed, completely correct answers to the problem at hand may be impossible or infeasible to compute. All that’s desired is an approximate answer. And when today’s perfectionist computers execute these fundamentally approximate programs, they squander energy.
To seize the opportunity this extravagance presents, computer designers could build machines that can switch into an energy-saving—albeit somewhat error-prone—mode on demand. The machine might turn down the CPU voltage, for example, at the expense of causing arithmetic errors every once in a while. Reducing the refresh rate on dynamic random-access memory (DRAM) chips could also save energy, with the trade-off being a few unwanted bit flips. And wireless devices could cut back on the power they draw if some communication errors were allowed.
Many researchers are working on such energy-saving hardware modifications. We have been exploring how programmers could make use of them. To that end, our research group at the University of Washington, in Seattle, has developed a computer language that lets benign errors occur every now and then while preventing the catastrophic ones. We call it EnerJ. It’s our contribution to a new approach for boosting energy efficiency called approximate computing.
The key difficulty with approximate computing is that even an approximate program sometimes needs to produce absolutely correct results. Consider a photo-manipulation program. While a handful of incorrect pixels won’t mar a large image, a single wrong bit in the file’s JPEG header could render the output useless.
This dichotomy is common to many kinds of software: Some parts of a program can tolerate occasional mistakes or imprecision, while other parts must always be executed precisely and without error. Approximate computers need to support both modes, and approximate programs must be written to use the energy-efficient mode wherever possible while avoiding errors that would lead to catastrophic failures.
But how does a computer distinguish between the parts of a program that can tolerate approximation and those that can’t? At this stage at least, the programmer needs to instruct the computer to do that, using a language that offers some mechanism for making the distinction.
Our prototype language, EnerJ, works by letting the programmer mark data as either approximate or precise. Note that we’re using those terms somewhat loosely here. The two kinds of data could differ in just numerical precision—that is, in the number of bits devoted to holding a value. Or they could differ in how prone they are to errors: An “approximate” datum would have a small but non-negligible chance of being garbage, whereas a “precise” datum can for all practical purposes be considered error free. Some of the hardware we’ve been investigating mix these two energy-saving approaches, allowing occasional errors in the least significant bits while making sure that the most significant ones are always correct.
EnerJ is an extension of the Java programming language, but its overall design can be applied to most languages with data types that the programmer declares explicitly. Such declarations are used to indicate whether a data element is meant to hold a Boolean (true/false) value, a byte, a 32-bit integer, a 64-bit integer, a 32-bit floating-point number, a 64-bit floating-point number, or various other possibilities. Everywhere the programmer declares a Java data element, he or she can mark it as approximate by writing “
@Approx” before the declaration. Data types declared without such an annotation are implicitly made precise. In other words, the system uses approximation only where the programmer has specifically allowed it.
Illustration: Jude Buffum
As a concrete example, imagine that you want to write some code that calculates the average shade of a black-and-white image that measures 1000 by 1000 pixels. Your code might start with the first pixel of the first row and note its value. It would then go on to the second pixel in the first row and add its value to the first. Then it would do the same with the third pixel, and so on, each time adding the pixel’s value to a running sum it’s maintaining in some variable—let’s call it
TOTAL. When it gets to the 1000th pixel in row one, the program starts all over on the second row of pixels and continues in that manner until it finishes up with the 1000th pixel of the 1000th row. At the end, it divides
TOTAL by 1 million to calculate the average pixel value. Easy enough.
In addition to the variable
TOTAL, the code requires two counter variables: one to keep track of the row number and one for the column number of the pixel it’s adding to the sum. After the column counter gets to 999, it resets to 0 and progresses to the next row. When both row and column counters reach 999, the program can divide by a million to get the average.
If along the way one of the million pixel values is a little off or doesn’t get added correctly to the sum, well, it’s no big deal: The answer will be affected, but not by much. If, however, one of the row or column counters doesn’t increase correctly, the program might throw an error, terminate prematurely, or even go into an infinite loop.
With EnerJ, the programmer simply marks
TOTAL and the array holding the pixel values with “
@Approx” when he or she declares them. The two counter variables—let’s call them
Illustration: Jude Buffum
The yet-to-be-built energy-saving computer this EnerJ program runs on would then be allowed to use approximate computation indiscriminately—as long as it didn’t affect the program’s precise data. For example, the computer might store the individual pixel values in unreliable, low-refresh-rate DRAM, but it would have to use the normal, reliable part of its memory to store
J. Or the addition operations that add the pixel values to
TOTAL could be run at a lower voltage level because both of the operands are approximate, but the operations that increase the two counter variables would have to be run at the normal voltage so that they were always computed exactly. That is, just by annotating the type declarations in the code, the programmer specifies where approximate storage and operations can be used and where such approximation is forbidden.
EnerJ also helps programmers avoid bugs that would let approximate computations contaminate data that need to stay precise. Specifically, it forbids the program from putting approximate data into precise variables. Continuing with our example, this assignment would be illegal:
I = TOTAL;
Because I is precise, EnerJ guarantees that no approximation can be used to compute it.
TOTAL is approximate, so copying its value into
I would introduce approximation into a part of the program that the programmer wants to keep precise and free of errors. On the other hand, the opposite assignment has no such issues:
TOTAL = I;
Precise-to-approximate assignments like this one do not violate any guarantees of precision because they affect only approximate data.
By allowing one kind of assignment while prohibiting the other, EnerJ ensures that data move in just one direction. This constraint acts like a one-way valve: Precise data can flow freely into the approximate part of the program, but not vice versa.
The researchers who pioneered systems for enforcing unidirectional data movement, called information-flow tracking, were not interested in saving energy. They were concerned about software security. They wanted their programs to distinguish between low-integrity (possibly compromised) data and data deemed to be of high integrity.
Information-flow languages can, for example, prevent a kind of computer attack called SQL injection. (Structured Query Language, or SQL, is used for database management.) Here, the attacker provides the system with a database command when it’s expecting just some ordinary input data. The system then carries out that command— perhaps doing great mischief in the process. Information-flow tracking can prevent such attacks by assuring that no inputs from the user flow to SQL statements. EnerJ borrows this idea to keep imprecision or rare errors from contaminating good data.
There’s another way, though, that approximate data can affect precise data: through control-flow statements. Consider this code:
if (TOTAL > 0.0)
I = 1;
I = 2;
There are no illegal assignments, but the approximate value
TOTAL still influences the precise variable
I. This situation is called implicit information flow and must also be avoided.
EnerJ prevents implicit information flow simply enough: It forbids approximate conditions in any control-flow statement. The expression “
TOTAL > 0.0” is approximate because one of the operands is approximate. So the compiler—the software that translates EnerJ statements into the low-level code that runs on whatever machine you’re using— issues an error when it sees that expression as the condition in an “if” statement.
The strict isolation that EnerJ enforces between the approximate and the precise can in some cases be overly limiting, though. Sometimes programs produce approximate data that then need to move into a precise part of the program—say, for checking or output. For example, we experimented with a smartphone app for reading bar codes that used several approximation-tolerant image-analysis algorithms to produce a string of bits from a photo of a QR code. It then calculated a “checksum”—a number used to ensure the integrity of some other block of data.
Checksums are very handy for detecting errors. Here’s a simple example: Say you’ve got a million bytes (a byte is a group of 8 bits) that you want to transmit in a way that is less than perfectly reliable. You might start by adding those million bytes together and then throwing away all but the least significant 8 bits of the answer. The result is the checksum, which you send after you send the data. This single-byte checksum can now be used to judge data integrity after transmission. Just compute the checksum again at the far end and compare it with the value that was sent. If even a single bit of the data got corrupted during transmission, the original and recalculated checksums will not match.
For our bar-code-reading app, we wanted to keep the checksum computation precise to catch any errors that might crop up because of the approximations used elsewhere. But that required the output of an approximate computation (the bar-code-reading algorithm) to flow into a precise computation—the checksum. Normally, that wouldn’t be possible. But EnerJ provides an escape hatch in the form of something called endorsements—markers that indicate that the programmer gives permission to break the strict isolation EnerJ otherwise enforces.
EnerJ’s data-type annotations, endorsements, and the isolation guarantees they provide make it possible to write programs that take advantage of approximate computation while retaining enough precise computation to stay reliable. With them, programmers can safely trade precision and strict correctness for increased energy efficiency. Just as programmers are accustomed to optimizing software for performance, EnerJ lets them optimize for energy savings.
How much savings? It’s a little hard to say, because nobody has yet built the kind of two-mode, energy-saving computers EnerJ needs to run on. But our group has written many EnerJ programs and run them in an environment that simulates hardware with various energy-saving features in its memory or central processing. Depending on what kind of program it is and how aggressively the simulated hardware seeks to reduce energy consumption, EnerJ saves anywhere from 10 to 40 percent.
We were happy that our newly minted computer language did so well in these trials. We were also pleasantly surprised to discover the great variety of programs that have parts where approximation makes sense. Image rendering and image recognition are both natural fits for EnerJ. Signal-processing algorithms and geometry problems in 3-D gaming were less obvious candidates, but they too could be readily split into approximate and precise components. Indeed, when you have EnerJ or another approximate-computing hammer at your disposal, every program starts to look like a nail.
We now need realistic CPU designs that can save energy when running EnerJ programs. With our former University of Washington colleague Hadi Esmaeilzadeh, who is now at Georgia Tech, and Doug Burger at Microsoft Research, we have begun designing processors that can run at two different voltage levels. When a program needs error-free behavior, it uses the higher voltage. For calculations that can tolerate the occasional glitch, the processor switches to the lower voltage and saves energy. We’ve also been exploring some more radical designs for hardware that can act as dedicated approximate coprocessors.
To make approximate computing blossom, programmers will also need tools to help them understand the trade-offs involved in sacrificing precision for energy savings. Programming-language researchers, including those in our group, are beginning to design tools that are analogous to today’s debuggers and performance profilers. With them, a programmer can find the answer to a question like “What will happen to the playback quality of a video file if I mark this part of my media player as approximate?” The right tools should let the programmer save quite a bit of energy without the user even noticing a difference.
Indeed, the energy that computers expend avoiding mistakes—even the rarest of small errors—looks increasingly wasteful when so much of today’s software already incorporates imperfections in the form of error-prone data or approximate output. To rectify things, the computing industry needs some new approximate programming languages, some new approximate hardware designs, and tools to help programmers understand how best to use them. Together, they should help computers become a little more like human brains, which are less than perfect but amazingly energy efficient.
This article originally appeared in print as “Good-Enough Computing.”
About the Authors
Lead author Adrian Sampson studies computer science and engineering at the University of Washington, in Seattle, where his two coauthors, Luis Ceze and Dan Grossman, are his Ph.D. advisors. One specializes in programming languages, the other in computer architecture. Sampson likens their often disparate guidance to “having an angel perched on one shoulder and the devil on the other,” though he won’t say which is which.