How the Spectre and Meltdown Hacks Really Worked
An in-depth look at these dangerous exploitations of microprocessor vulnerabilities and why there might be more of them out there
We're used to thinking of computer processors as orderly machines that proceed from one simple instruction to the next with complete regularity. But the truth is, that for decades now, they've been doing their tasks out of order and just guessing at what should come next. They're very good at it, of course. So good in fact, that this ability, called speculative execution, has underpinned much of the improvement in computing power during the last 25 years or so. But on 3 January 2018, the world learned that this trick, which had done so much for modern computing, was now one of its greatest vulnerabilities.
Throughout 2017, researchers at Cyberus Technology, Google Project Zero, Graz University of Technology, Rambus, University of Adelaide, and University of Pennsylvania, as well as independent researchers such as cryptographer Paul Kocher, separately worked out attacks that took advantage of speculative execution. Our own group had discovered the original vulnerability behind one of these attacks back in 2016, but we did not put all the pieces together.
These types of attacks, called Meltdown and Spectre, were no ordinary bugs. At the time it was discovered, Meltdown could hack all Intel x86 microprocessors and IBM Power processors, as well as some ARM-based processors. Spectre and its many variations added Advanced Micro Devices (AMD) processors to that list. In other words, nearly the whole world of computing was vulnerable.
And because speculative execution is largely baked into processor hardware, fixing these vulnerabilities has been no easy job. Doing so without causing computing speeds to grind into low gear has made it even harder. In fact, a year on, the job is far from over. Security patches were needed not just from the processor makers but from those further down the supply chain, such as Apple, Dell, Linux, and Microsoft. The first computers powered by chips that are intentionally designed to be resistant to even some of these vulnerabilities arrived only recently.
Spectre and Meltdown are the result of the difference between what software is supposed to do and the processor's microarchitecture—the details of how it actually does those things. These two classes of hacks have uncovered a way for information to leak out through that difference. And there's every reason to believe that more ways will be uncovered. We helped find two, Branchscope and SpectreRSB [PDF], last year.
If we're going to keep the pace of computing improvements going without sacrificing security, we're going to have to understand how these hardware vulnerabilities happen. And that starts with understanding Spectre and Meltdown.
How a Pipeline Speeds Computing Using Speculative Execution
In modern computing systems, software programs written in human-understandable languages such as C++ are compiled into assembly-language instructions—fundamental operations that the computer processor can execute. To speed execution, modern processors use an approach called pipelining. Like an assembly line, the pipeline is a series of stages, each of which is a step needed to complete an instruction. Some typical stages for an Intel x86 processor include those that bring in the instruction from memory and decode it to understand what the instruction means. Pipelining basically brings parallelism down to the level of instruction execution: When one instruction is done using a stage, the next instruction is free to use it.
Since the 1990s, microprocessors have relied on two tricks to speed up the pipeline process: out-of-order execution and speculation. If two instructions are independent of each other—that is, the output of one does not affect the input of another—they can be reordered and their result will still be correct. That's helpful, because it allows the processor to keep working if an instruction stalls in the pipeline. For example, if an instruction requires data that is out in DRAM main memory rather than in the cache memory located in the CPU itself, it might take a few hundred clock cycles to get that data. Instead of waiting, the processor can move another instruction through the pipeline.
The second trick is speculation. To understand it, start with the fact that some instructions necessarily lead to a change in which instructions come next. Consider a program containing an “if" statement: It checks for a condition, and if the condition is true, the processor jumps to a different location in the program. This is an example of a conditional-branch instruction, but there are other instructions that also lead to changes in the flow of instructions.
Now consider what happens when such a branch instruction enters a pipeline. It's a situation that leads to a conundrum. When the instruction arrives at the beginning of the pipeline, we do not know its outcome until it has progressed fairly deep into the pipeline. And without knowing this outcome, we cannot fetch the next instruction. A simple but naive solution is to prevent new instructions from entering the pipeline until the branch instruction reaches a point at which we know where the next instruction will come from. Many clock cycles are wasted in this process, because pipelines typically have 15 to 25 stages. Even worse, branch instructions come up quite often, accounting for upwards of 20 percent of all the instructions in many programs.
To avoid the high performance cost of stalling the pipeline, modern processors use an architectural unit called a branch predictor to guess where the next instruction, after a branch, will come from. The purpose of this predictor is to speculate about a couple of key points. First, will a conditional branch be taken, causing the program to veer off to a different section of the program, or will it continue on the existing path? And second, if the branch is taken, where will the program go—what will be the next instruction? Armed with these predictions, the processor pipeline can be kept full.
Because the instruction execution is based on a prediction, it is being executed “speculatively": If the prediction is correct, performance improves substantially. But if the prediction proves incorrect, the processor must be able to undo the effects of any speculatively executed instructions relatively quickly.
The design of the branch predictor has been robustly researched in the computer-architecture community for many years. Modern predictors use the history of execution within a program as the basis for their results. This scheme achieves accuracies in excess of 95 percent on many different kinds of programs, leading to dramatic performance improvements, compared with a microprocessor that does not speculate. Misspeculation, however, is possible. And unfortunately, it's misspeculation that the Spectre attacks exploit.
Another form of speculation that has led to problems is speculation within a single instruction in the pipeline. That's a pretty abstruse concept, so let's unpack it. Suppose that an instruction requires permission to execute. For instance, an instruction could direct the computer to write a chunk of data to the portion of memory reserved for the core of the operating system. You wouldn't want that to happen, unless it was sanctioned by the operating system itself, or you'd risk crashing the computer. Prior to the discovery of Meltdown and Spectre, the conventional wisdom was that it is okay to start executing the instruction speculatively even before the processor has reached the point of checking whether or not the instruction has permission to do its work.
In the end, if the permission is not satisfied—in our example, the operating system has not sanctioned this attempt to fiddle with its memory—the results are thrown out and the program indicates an error. In general, the processor may speculate around any part of an instruction that could cause it to wait, provided that the condition is eventually resolved and any results from bad guesses are, effectively, undone. It's this type of intra-instruction speculation that's behind all variants of the Meltdown bug, including its arguably more dangerous version, Foreshadow.
The insight that enables speculation attacks is this: During misspeculation, no change occurs that a program can directly observe. In other words, there's no program you could write that would simply display any data generated during speculative execution. However, the fact that speculation is occurring leaves traces by affecting how long it takes instructions to execute. And, unfortunately, it's now clear that we can detect these timing signals and extract secret data from them.
What is this timing information, and how does a hacker get hold of it? To understand that, you need to grasp the concept of side channels. A side channel is an unintended pathway that leaks information from one entity to another (usually both are software programs), typically through a shared resource such as a hard drive or memory.
As an example of a side-channel attack, consider a device that is programmed to listen to the sound emanating from a printer and then uses that sound to deduce what is being printed. The sound, in this case, is a side channel.
In microprocessors, any shared hardware resource can, in principle, be used as a side channel that leaks information from a victim program to an attacker program. In a commonly used side-channel attack, the shared resource is the CPU's cache. The cache is a relatively small, fast-access memory on the processor chip used to store the data most frequently needed by a program. When a program accesses memory, the processor first checks the cache; if the data is there (a cache hit), it is recovered quickly. If the data is not in the cache (a miss), the processor has to wait until it is fetched from main memory, which can take several hundred clock cycles. But once the data arrives from main memory, it's added to the cache, which may require tossing out some other data to make room. The cache is divided into segments called cache sets, and each location in main memory has a corresponding set in the cache. This organization makes it easy to check if something is in the cache without having to search the whole thing.
Cache-based attacks had been extensively researched even before Spectre and Meltdown appeared on the scene. Although the attacker cannot directly read the victim's data—even when that data sits in a shared resource like the cache—the attacker can get information about the memory addresses accessed by the victim. These addresses may depend on sensitive data, allowing a clever attacker to recover this secret data.
How does the attacker do this? There are several possible ways. One variation, called Flush and Reload, begins with the attacker removing shared data from the cache using the “flush" instruction. The attacker then waits for the victim to access that data. Because it's no longer in the cache, any data the victim requests must be brought in from main memory. Later, the attacker accesses the shared data while timing how long this takes. A cache hit—meaning the data is back in the cache—indicates that the victim accessed the data. A cache miss indicates that the data has not been accessed. So, simply by measuring how long it took to access data, the attacker can determine which cache sets were accessed by the victim. It takes a bit of algorithmic magic, but this knowledge of which cache sets were accessed and which were not can lead to the discovery of encryption keys and other secrets.
Meltdown, Spectre, and their variants all follow the same pattern. First, they trigger speculation to execute code desired by the attacker. This code reads secret data without permission. Then, the attacks communicate the secret using Flush and Reload or a similar side channel. That last part is well understood and similar in all of the attack variations. Thus, the attacks differ only in the first component, which is how to trigger and exploit speculation.
Meltdown attacks exploit speculation within a single instruction. Although assembly-language instructions are typically simple, a single instruction often consists of multiple operations that can depend on one another. For example, memory-read operations are often dependent on the instruction satisfying the permissions associated with the memory address being read. An application usually has permission to read only from memory that's been assigned to it, not from memory allocated to, say, the operating system or some other user's program. Logically, we should check the permissions before allowing the read to proceed, which is what some microprocessors do, notably those from AMD. However, provided the final result is correct, CPU designers assumed that they were free to speculatively execute these operations out of order. Therefore, Intel microprocessors read the memory location before checking permissions, but only “commit" the instruction—making the results visible to the program—when the permissions are satisfied. But because the secret data has been retrieved speculatively, it can be discovered using a side channel, making Intel processors vulnerable to this attack.
The Foreshadow attack is a variation of the Meltdown vulnerability. This attack affects Intel microprocessors because of a weakness that Intel refers to as L1 Terminal Fault (L1TF). While the original Meltdown attack relied on a delay in checking permissions, Foreshadow relies on speculation that occurs during a stage of the pipeline called address translation.
Software views the computer's memory and storage assets as a single contiguous stretch of virtual memory all its own. But physically, these assets are divided up and shared among different programs and processes. Address translation turns a virtual memory address into a physical memory address.
Specialized circuits on the microprocessor help with the virtual-to-physical memory-address translation, but it can be slow, requiring multiple memory lookups. To speed things up, Intel microprocessors allow speculation during the translation process, allowing a program to speculatively read the contents of a part of the cache called L1 regardless of who owns that data. The attacker can do this, and then disclose the data using the side-channel approach we already described.
In some ways Foreshadow is more dangerous than Meltdown, in other ways it is less. Unlike Meltdown, Foreshadow can read the contents only of the L1 cache, because of the specifics of Intel's implementation of its processor architecture. However, Foreshadow can read any contents in L1—not just data addressable by the program.
Spectre attacks manipulate the branch-prediction system. This system has three parts: the branch-direction predictor, the branch-target predictor, and the return stack buffer.
The branch-direction predictor predicts whether a conditional branch, such as one used to implement an “if" statement in a programming language, will be taken or not taken. To do this, it tracks the previous behavior of similar branches. For example, it may mean that if a branch is taken twice in a row, future predictions will say it should be taken.
The branch-target predictor predicts the target memory address of what are called indirect branches. In a conditional branch, the address of the next instruction is spelled out, but for an indirect branch that address has to be computed first. The system that predicts these results is a cache structure called the branch-target buffer. Essentially, it keeps track of the last computed target of the indirect branches and uses these to predict where the next indirect branch should lead to.
The return stack buffer is used to predict the target of a “return" instruction. When a subroutine is “called" during a program, the return instruction makes the program resume work at the point from which the subroutine was called. Trying to predict the right point to return to based only on prior return addresses won't work, because the same function may be called from many different locations in the code. Instead, the system uses the return stack buffer, a piece of memory on the processor, that keeps the return addresses of functions as they are called. It then uses these addresses when a return is encountered in the subroutine's code.
Each of these three structures can be exploited in two different ways. First, the predictor can be deliberately mistrained. In this case, the attacker executes seemingly innocent code designed to befuddle the system. Later, the attacker deliberately executes a branch that will misspeculate, causing the program to jump to a piece of code chosen by the attacker, called a gadget. The gadget then sets about stealing data.
A second manner of Spectre attack is termed direct injection. It turns out that under some conditions the three predictors are shared among different programs. What this means is that the attacking program can fill the predictor structures with carefully chosen bad data as it executes. When an unwitting victim executes their program either at the same time as the attacker or afterward, the victim will wind up using the predictor state that was filled in by the attacker and unwittingly set off a gadget. This second attack is particularly worrisome because it allows a victim program to be attacked from a different program. Such a threat is especially damaging to cloud-service providers because they cannot then guarantee that their client data is protected.
The Spectre and Meltdown vulnerabilities presented a conundrum to the computing industry because the vulnerability originates in hardware. In some cases the best we can do for existing systems—which make up the bulk of installed servers and PCs—is to try to rewrite software to attempt to limit the damage. But these solutions are ad hoc, incomplete, and often result in a big hit to computer performance. At the same time, researchers and CPU designers have started thinking about how to design future CPUs that keep speculation without compromising security.
One defense, called kernel page-table isolation (KPTI) [PDF], is now built into Linux and other operating systems. Recall that each application views the computer's memory and storage assets as a single contiguous stretch of virtual memory all its own. But physically, these assets are divided up and shared among different programs and processes. The page table is essentially the operating system's map, telling it which parts of a virtual memory address correspond to which physical memory addresses. The kernel page table is responsible for doing this for the core of the operating system. KPTI and similar systems defend against Meltdown by making secret data in memory, such as the OS, inaccessible when a user's program (and potentially an attacker's program) is running. It does this by removing the forbidden parts from the page table. That way, even speculatively executed code cannot access the data. However, this solution means extra work for the operating system to map these pages when it executes and unmap them afterward.
Another class of defenses gives programmers a set of tools to limit dangerous speculation. For example, Google's Retpoline patch rewrites the kind of branches that are vulnerable to Spectre Variant 2, so that it forces speculation to target a benign, empty gadget. Programmers can also add an assembly-language instruction that limits Spectre v1, by restricting speculative memory reads that follow conditional branches. Conveniently, this instruction is already present in the processor architecture and is used to enforce the correct ordering between memory operations originating on different processor cores.
As the processor designers, Intel and AMD had to go deeper than a regular software patch. Their fixes update the processor's microcode. Microcode is a layer of instructions that fits between the assembly language of regular software and the processor's actual circuitry. Microcode adds flexibility to the set of instructions a processor can execute. It also makes it simpler to design a CPU because when using microcode, complex instructions are translated to multiple simpler instructions that are easier to execute in a pipeline.
Basically, Intel and AMD adjusted their microcode to change the behavior of some assembly-language instructions in ways that limit speculation. For example, Intel engineers added options that interfere with some of the attacks by allowing the operating system to empty the branch-predictor structures in certain circumstances.
Some Speculative Executions Vulnerabilities
|Spectre Variant 1||Branch-direction predictor||1/2018|
|Spectre Variant 2||Branch-target buffer||1/2018|
|Spectre Variant 4||Speculative store bypass||5/2018|
|Spectre RSB/ Ret2Spec||Return stack buffer||6/2018|
|Lazy FPU||Lazy restore of floating-point unit registers||6/2018|
A different class of solutions attempts to interfere with the attacker's ability to transmit the data out using side channels. For example, MIT's DAWG technology securely divides up the processor cache so that different programs don't share any of its resources. Most ambitiously, there are proposals for new processor architectures that would introduce structures on the CPU that are dedicated to speculation and separate from the processor's cache and other hardware. This way, any operations that are executed speculatively but are not eventually committed are never visible. If the speculation result is confirmed, the speculative data is sent to the processor's main structures.
Speculation vulnerabilities have lain dormant in processors for over 20 years, and they remained, so far as anyone knows, unexploited. Their discovery has substantially shaken industry and highlighted how cybersecurity is not only a problem for software systems but for hardware as well. Since the initial discovery, around a dozen variants of Spectre and Meltdown have been revealed, and it is likely that there are more. Spectre and Meltdown are, after all, side effects of core design principles that we have relied on to improve computer performance, making it difficult to eliminate such vulnerabilities in current system designs. It is likely that new CPU designs will evolve to retain speculation, while preventing the type of side-channel leakage that enables these attacks. Nevertheless, future computer-system designers, including those designing processor chips, must be aware of the security implications of their decisions, and no longer optimize only for performance, size, and power.
About the Author
Nael Abu-Ghazaleh is chair of the computer engineering program at the University of California, Riverside. Dmitry Evtyushkin is an assistant professor of computer science at the College of William and Mary, in Williamsburg, Va. Dmitry Ponomarev is a computer science professor at the State University of New York at Binghamton.
To Probe Further
Paul Kocher and the other researchers who collectively disclosed Spectre first explained it here [PDF]. Moritz Lipp explained Meltdown in this talk at Usenix Security '18. Foreshadow was detailed at the same conference.
A group of researchers including one of the authors have come up with a systematic evaluation of Spectre and Meltdown attacks that uncovers additional potential attacks [PDF]. IBM engineers did something similar, and Google engineers recently came to the conclusion that side-channel and speculative execution attacks are here to stay [PDF].