Just three weeks before the 2006 Game Developers Conference in San Jose, IBM had a problem. The company desperately needed a boffo, unforgettable piece of computer-generated imagery to demonstrate the power of the new Cell nine-core microprocessor, which Big Blue had just developed with Sony and Toshiba. The chip, produced at a cost of US $400 million, was set to debut in Sony’s new PlayStation 3 game console in November, but developers who had been tearing their hair out trying to program games for the Cell’s new architecture didn’t yet have any seriously flashy footage to present at the March show.
So IBM turned to the chicken wrangler.
Actually, he’s a 39-year-old computer science professor and software entrepreneur named Michael McCool. In just one weekend, his company, RapidMind, in Waterloo, Ont., Canada, used the programming platform that McCool has been working on for nearly a decade to create a crowd simulation of 16 000 individual chickens.
Imagine the biggest flock of virtual fowl ever assembled. Each chicken is controlled by a simple artificial intelligence program, operating according to a handful of rules. Each chicken wants to move toward the rooster but must avoid collisions with other chickens, fences, and the barn. To do so, each one must constantly check the position of its nearest neighbors and other objects in its environment and then decide how to move.
If that doesn’t sound all that impressive to you, consider this: all 16 000 of those faux chickens are doing this maneuvering at the same time on a single Cell microprocessor. It is a chore that would tax a rack full of conventional servers.
After viewing the virtual barnyard at the IBM booth during the game conference, one new fan gave the RapidMind team a rubber chicken. The company’s developers stashed the gag gift near an air-hockey table in the office rec room. Now, every time programmers hit a new performance benchmark, one of them grabs the chicken and squeezes until it emits an unholy scream.
The masterminds at RapidMind thoroughly abused that poor bird as they prepared for last month’s release of the RapidMind Development Platform 2.0, the first software tool to help programmers write code for microprocessor chips like the Cell as well as for graphics processors from ATI Technologies, Nvidia Corp., and other companies. What the processors have in common is that they are all multicore chips—that is, each individual chip has several or even dozens of processing units, called ”cores.” By the middle of this year, RapidMind plans to release version 3.0 of the platform, designed to support multicore CPUs from Intel and Advanced Micro Devices.
RapidMind’s timing couldn’t be better. While the Moore’s Law–decreed doubling of transistors goes on unabated every 18 months, AMD, IBM, Intel, and others have determined that all those transistors can’t switch on and off much faster than they already do. Clock speeds top out at around 4 gigahertz, beyond which a microprocessor starts getting hot enough to spontaneously combust. So instead of making smaller chips that run faster, the near-term strategy is to keep chips the same size but put more processor cores in them.
What the Experts Say
NICK TREDENNICK: Efforts to extend standards-based, serial programming languages with features to describe parallel constructs are likely to fail. What is more likely to succeed are languages that raise the level of abstraction in algorithm description.
The multicore revolution started several years ago with graphics processing units (GPUs) made by ATI and Nvidia. Today, graphics chips sport dozens of cores. Now other kinds of multicore chips are establishing themselves in the mainstream: the Cell is already available in the PlayStation 3 and is moving quickly into servers, televisions, and other applications. And four-core CPUs from AMD and Intel are scheduled to ship within the next few weeks.
There’s more to come: Intel unveiled a prototype chip with 80 cores in September, part of a research project whose goal is to create a single chip capable of processing 1 trillion floating-point operations per second.
Of course, there’s a catch. The tantalizing possibilities of multicore chips—stunningly realistic and densely populated games, faster scientific computations, more accurate modeling of seismic, medical, and financial data—all depend on the ability of programmers to routinely solve programming challenges beyond those they face today. Specifically, programmers are going to have to write programs that are divided into parts that run in parallel on several processors simultaneously, a chore that has proven fiendishly difficult in the past.
”We’re in a period of pain and turbulence for application designers,” says Carl Claunch, vice president of research and advisory services at Gartner Research, in San Jose. ”Trying to do more and more in parallel adds stress, and we don’t have good tools for it right now.”
Developers are accustomed to writing programs that execute functions one after another in serial fashion on one or maybe two microprocessor cores. Before the debut of the Cell chip a year ago, parallel programming was largely confined to niches in high-performance computing and academic computer science. So until now, programmers hacking out the multicore version of a game or three-dimensional simulation have been literally left to their own devices.
The results aren’t shabby, but they’re far from optimal. Developers at Insomniac Games, the Burbank, Calif., publisher of Resistance: Fall of Man for the PlayStation 3, had to create their own programming tools and teach themselves how to allocate different programming tasks to the Cell’s nine different cores [see ”The Insomniacs,” IEEE Spectrum, December 2006]. Their bootstrapping methods took them only so far, however. For instance, because their software couldn’t automatically allocate tasks to whichever core was available, Insomniac programmers had to dedicate two cores to handle collisions in situations where carnage and chaos among men, monsters, and machines needed to be approximated in real time and in living color.
Hina Shah, director of IBM’s Cell Ecosystem and Solutions Development unit, has heard from customers about the new challenges the Cell presents and has a full-time job seeking solutions to ease their pain. ”Today, if a developer is going to program for Cell directly, they would have to change relevant parts of their application and manage all aspects of porting it to Cell,” she says.
”The nice thing about RapidMind is that you don’t need to change your whole program,” Shah adds. ”You can just pick parts of your application that should be accelerated, and instead of changing that code to program all of Cell’s cores by hand, you simply use a programming interface that handles a lot of the complications on its own.”
Theoretically, RapidMind’s platform could help programmers code their entire applications to run on multiple cores. In practice, users have fed the RapidMind platform the most computationally intensive portions of their programs. The platform accelerates these chunks by breaking them up into smaller pieces and running them in parallel on several processor cores at once.