9 February 2010—Chips that can simulate a supernova or predict a hurricane are yesterday’s goal, if Intel’s recently unveiled 48-core research chip is any indication. Today’s goal is squeezing all the simple but extensive work of a data center onto a single chip. Big IT firms have huge, sophisticated networks of servers, says Justin Rattner, Intel’s chief technology officer and director of Intel Labs. But ”they’re not computing the mass of a proton,” he says. ”They’re searching for the needle in a haystack.”
In response to the need for better, faster data mining, Intel Labs has developed what it’s calling a single-chip cloud computer. The 1.3-billion-transistor research chip, the company reported yesterday at the IEEE International Solid-State Circuits Conference, consists of 48 Pentium-class IA-32 cores formed into a network of 24 tiles. Each tile has two cores plus one router to allow intercore communication. The keys to its efficiency at handling needle-in-a-haystack-type tasks are two new software-based techniques: one for rapidly transferring data between its many cores and the other for controlling the power those cores consume.
Normally, for data to get from one core to another, it must leave the first core, zip over to main memory, and then make its way to the second core, says Intel’s Jason Howard, who authored the paper presented at ISSCC, in San Francisco. That process takes time, so the Intel researchers set up each core with the ability to transfer data directly. A 16-kilobyte message-passing buffer on each tile (384 KB total on die) transfers data from one core’s cache to another’s without running the data all the way out to main memory.
But keeping one cache from interfering with data in another cache traditionally requires complicated and power-hungry hardware. So Intel chose to rely on software instead of hardware to keep the data from being corrupted. The software tells the sending cache to get rid of its copy of the data after sending it and tells the receiver to delete old copies before grabbing the new message. ”So there’s only one owner of data at any time,” says Howard. These adaptations speed up message passing by 15 times, compared to previous techniques.
The chip uses software to control another key feature as well: power. According to Nitin Borkar, the director of advanced microprocessor research at Intel Labs, the biggest challenge for the future will be ”controlling power consumption as you put more and more cores on a chip.”
In today’s generation of multicore processors, for example, some parts that aren’t in use are still on and wasting power. Because voltage and frequency are typically governed by the operating system, it may take a long time for the system to change power levels or even to realize that the power levels need to be changed.
Intel’s design lets the application developer control how to tune the performance of the machine, Borkar says, rather than letting the OS decide. The chip is divided into voltage and frequency ”islands,” which allow different tiles to be controlled independently. An application designer can make a simple adjustment in programming code to throttle each island as needed, from 25 to 125 watts.
That could be useful for such tasks as running a video-processing program overnight, Borkar says. If you don’t need the program to finish in a half hour, it could run at a higher power efficiency instead of simply higher power. Or a program could start up using all 48 cores at once, at high voltage and high frequency, and then throttle each one back as it finishes its computations, while others stay on full blast. This feature provides the ”compute on demand” function of a traditional data center, Borkar says. And because ”change” commands are processed quickly—within seven clock cycles—they can ensure that power is running at optimum levels at all times.
Giving that control to application designers is so new, Howard adds, that some aren’t sure how to use it, though many have shown an interest in learning.
Intel is partnering with researchers in academia and industry to design experiments with the chip to help alter its architecture, so that it will be most useful ”for data-intensive apps, versus compute-intensive apps,” Rattner says.
The research chip has already booted up Linux and runs standard, commercially available software, unlike the proprietary applications required of previous high-performance-based chips.