Breathing life into digitally animated faces
Imagine Marlon Brando in Godfather IV , Sylvester Stallone boxing with a younger version of himself in Rocky VIII, or Marilyn Monroe playing opposite Johnny Depp. Such scenes may hit the big screen a lot sooner than you think.
Realistic human bodies created by computers already travel naturally across movie and game screens. Portraits rendered by computers are equally believable. To date, however, when animators have tried to make those portraits move, the illusion breaks down. Computer graphics have not been able to conquer the human countenance. Synthetic faces on the big screen have been more odd than realistic. Crossing that frontier will take not only advances by technologists but also the brilliance of artists and perhaps a lawyer or two.
On the technical front, developers have, during the past year or so, focused on motion-capture technology-that is, methods for converting the performance of a live actor into the framework for a computer-generated being-as the final step that will make a digitally rendered human believable. Two recent advances have filmmakers excited. But to understand the possibilities of the techniques, we first have to consider how one builds a computer version of an actor and why the results so far have fallen short of directors' visions.
A science-fiction dream for many years, Hollywood's quest to generate a realistic image of a human on screen began in the early 1990s, with the 1991 release of Terminator 2 and the 1993 release of Jurassic Park. These movies-and the money they made-excited the film industry's interest in the possibilities of computer graphics.
Previously, computer graphics had played only supporting roles, generating scenery or special-effects sequences. After Terminator 2 and Jurassic Park, computer graphics moved to the forefront of mainstream cinema. In The Mask (1994), the director put a digitized version of Jim Carrey's head at the center of the screen, doing impossible things in great green detail. In 2001, Gollum's computer-generated face started waxing lyrical in The Lord of the Rings trilogy.
And last year, few moviegoers who saw Pirates of the Caribbean: Dead Man's Chest noticed that the head of Davy Jones, a strange, part-human, part-cephalopod being, was a digital creation.
But the digital creatures in The Mask, The Lord of the Rings, and the second Pirates of the Caribbean movie had no counterparts in reality. We know that magical masks, creepy Gollums, and humanoid octopuses aren't real, so we don't overanalyze them; we don't compare what we see with our expectations. We just focus on the entertainment. The same principle makes Saturday morning cartoons entertaining: in a stylized medium, we ignore the faults and just have fun.
But the industry couldn't stop there. People thought that if something simple worked so well, something incredibly realistic had to be better. Ironically, however, the more realistic the computer-generated human, the more difficult it has been for an audience to accept it.
Masahiro Mori, a Japanese roboticist, described the phenomenon in 1970 as the Uncanny Valley. He discovered that, to a certain point, as a robot looks and behaves more human, a person's emotional response to it becomes increasingly positive and empathic. Then, at some point along the path toward realism, the reaction flip-flops, and a person interacting with the robot finds it repulsive. Make the robot even more realistic, and the typical response to the robot flip-flops again, once more becoming empathic. The reason, Mori wrote, is that in a robot that is mostly unlike a human, the human characteristics stand out and generate empathy. But if the robot appears almost human, the nonhuman characteristics stand out and create a feeling of strangeness.
The animated characters in the popular TV show "The Simpsons" and in the movie The Incredibles (2004) are nonhuman enough that their human foibles generate empathy. However, the digitally animated movies Final Fantasy: The Spirits Within (2001) and The Polar Express (2004) fell into the Uncanny Valley.
In Final Fantasy, the best talent in the industry worked on the best computer systems of the time and created still images that looked lifelike and bodily motions and clothes that looked photo-realistic. But viewers found the final moving picture eerie because of the characters' faces. Although skin stretched realistically over bone and muscle, muscle pulled skin, and flesh reddened as it flushed with blood, something wasn't right. The Polar Express, taking advantage of an entire new generation of graphics technology, was a roller-coaster ride of amazing effects and beautiful digitally rendered vistas. But the computer-generated human faces just didn't ring true. Reviewer after reviewer described them as zombies. "Those human characters in the film come across as downright creepy," reviewer Paul Clinton said on CNN.com. It was as if the movie were filled with mannequins or the walking dead; it brought up dark nightmares. Because the art was so close to life, when it didn't quite come alive, it made viewers think of death.
To understand the technical challenge, consider what goes into creating a computer-generated image of a person today. The process is essentially the same whether the computer actor is appearing in a game or a movie. But game figures don't look as realistic as movie characters, because the game systems cut corners to create their animations in real time, whereas movie systems can take hours to render each frame before moving on.
First comes modeling. A computer-graphics system can make a model out of a variety of fundamental units. It can start with polygons, creating a complex surface out of many simple shapes-picture playground climbing bars covered with a tightly stretched tarp. Or it can use nonuniform rational B-splines (NURBS). Think of a square piece of cloth stretched into a basic shape, like a square plane, a cylinder, or a sphere; the artist tugs on the virtual cloth to stretch and compress it in various directions. Or the artist can use a combination of techniques.
After artists create a virtual sculpture of the person being simulated, they then apply "shaders," which define the visual properties of the model-the look. How shiny is it? Is it transparent? Is it bumpy, smooth, translucent, or glowing? What color is it? What is its texture-hair, flesh, rock?
Next, to allow the face to move, the artist builds a "rig," that is, an internal, mechanically accurate skeleton that fits inside the model. The animator then uses the graphics system to connect rules governing the motion of real bones to the rig. Finally, the artist indicates where the computer-graphics system should attach the modeled surfaces to the rig, a process called binding.
Between the rig and the skin, the artist can select "deformation" tools, which act like virtual air bladders, simulating the flexing of muscles beneath the skin. For example, a "sculpt" deformer under a bicep swells whenever an elbow bends.
With all those components of the computer figure linked together, the artist can at last make it move. There are three choices. In "hand keying," the animator interacts directly with the rig, posing it in key positions for the movie or game sequence; the computer system fills in the steps between. Another approach, "crowd-sim," allows the computer to take control and move the characters automatically. But because the programming required for crowd-sim is time-consuming, computer artists typically use it only for rendering scenes of large crowds. Crowd-sim helped animate the huge battle sequences in The Lord of the Rings movies.
Instead of hand-keying or crowd-sim, an artist can use a live actor dressed in black spandex and covered with perhaps a hundred tiny white balls to record the animation in a process called motion-capture. The director films the actor going through the appropriate motions, and the computer system converts digital images to movements of the already created digital skeleton. The Polar Express team widely publicized its use of motion-capture. [See photos, " ."]
Computer-graphics tools for modeling, shading, rigging, and deforming are mature. That is, advances in the techniques are incremental. But motion-capture technology is changing dramatically. And tiny white balls may soon be history.
A San Francisco start-up called Mova, founded by entre preneur Steve Perlman of Web TV fame, demonstrated a motion-capture technology in July that relies on phosphorescent makeup. A makeup artist sponges the actor's face with the cream, essentially the kind of glow-in-the-dark substance some kids wear on Halloween. The filmmakers light the stage with fluorescent fixtures that flash some 100 times per second, too quickly for the moments of darkness to be perceived. So for the actors, the room appears normally lit.
Two sets of cameras-as many as 28-focus on the actor. One set captures the motions in normal lighting, when the fluorescent lights are on; the other set records the phosphorescent patterns, created by the natural irregularities in the makeup, when the lights are off. From the images, the motion-capture system creates a three-dimensional surface model of the actor, using 100 000 polygons or more [see photos, " "].
Mova says the resolution of the model is close to that of high-definition television. The system generates a preview, mapping the moving surface geometry it has captured onto a normal video view, to allow the technicians to check the motion and correct any errors. Then it attaches the images captured in standard lighting onto the digital model. The Mova system, though optimal for the face, can be used for capturing body motion as well. In such cases, the actor wears clothes treated with phosphorescent dye.
Mova says movie and game makers have begun using its system and the public will begin seeing the results in 2008.
Although the quality of the computer images displayed in video games tends to lag behind what's seen in movies, the latest generation of gaming systems-the PlayStation 3 and the Xbox 360-are allowing some video-game developers to take advantage of detailed images captured by systems like Mova's. The most recent Tiger Woods golf game by Electronic Arts for Xbox 360, for example, showcases facial animation for Woods and other well-known golfers. The developers used an array of cameras to record images of Woods and the three other pro golfers from different angles and created a model, using software developed at Electronic Arts' EAX Studios, in Vancouver, B.C., Canada. The process produced the most detailed facial animation in a game to date [see photo, " "].
Image Metrics, of Santa Monica, Calif., is taking a different approach, decoding motion from video images instead of capturing it from live actors. The developers-Alan Brett, Gareth Edwards, and Kevin Walker-were postdoctoral researchers at the University of Manchester, England, where they worked on problems in image analysis. In 2000, they started Image Metrics to develop applications for their facial-recognition and image-analysis software. So far they have announced products intended for use in security systems, medical imaging and, most recently, entertainment graphics.
Image Metrics analyzed thousands of faces to build a statistical model of the human countenance and how it moves. Using that information, the Image Metrics system can analyze a video recording of an actor's performance, frame by frame, extracting information about the position and movements of facial features, including the eyes, eyebrows, tongue, and skin creases. Running on standard PCs with high-end graphics cards, the software takes 2 to 3 hours to complete 10 to 15 seconds of cinema-quality video, the length of a typical single-camera shot. From the collected data, Image Metrics generates key frames used to animate the facial rig built by artists. The key frames, indicating positions of the jaw or the eyelid, for example, are nearly identical to what someone would painstakingly create if animating the rig by hand keying. The data, therefore, are easy for animators to edit if some part of the facial motion needs to be adjusted either to seem more real or to suit the computer-generated image better.
The Image Metrics system can be used with previously recorded film, raising the possibility of re-creating actors of the past.
Lair, to be introduced by Sony this month for the PlayStation 3, is the latest video game to use the system Image Metrics pioneered. Foodfight!, due to be released in November by Lions Gate, is the first movie to use the technology. And David Barton, a producer at Image Metrics, says that a yet-to-be-announced movie now in production is using the first version of a scheme that combines images from three cameras filming from different angles simultaneously; he predicts that the results will be stunning.
The new tools will allow animators to get closer and closer to creating a realistic human face. The next hurdle is a legal one.
Let's go back to the making of The Polar Express. Viewing the footage, the animators knew that the images, though captured and rendered accurately, were not looking as they had anticipated. And they likely knew they could go in and tweak the animation to, if not fix the problem completely, at least make the faces look more real on the movie screen. Legally, however, the artists did not have the freedom to tweak it. An actor can, by contract, retain the rights to his image, motion, or voice. And in this case, just as an actor might not want a sound editor to modify his voice, Tom Hanks reportedly didn't want a digital editor to adjust his movements. The Polar Express was released as captured.
In the future, lawyers will have to sort out the question of an actor's "motion rights." Today, no law specifically addresses digitally re-created actors, though some laws relate to the issue. The right of publicity, which varies by state, can protect the image or even characteristic mannerisms of living-and sometimes dead-actors. Copyright law gives some protection, though the copyright is usually held by the movie studios, not by the actors themselves. Actors trying to protect their images can also invoke federal trademark law.
"A uniform federal law would be beneficial," says Joseph J. Beard, a professor at the St. John's University School of Law, in New York City. "But on one hand you have the people who own rights, like the post-mortem right to an image, wanting those rights to be very expansive; on the other hand you have the Motion Picture Association of America and advertising firms that would like to narrow them."
In the end, it will all come down to art. There is no standard procedure for a moviemaker to follow in creating a believable human; rather, individual artists make countless choices. As digital flesh deforms, where do the wrinkles come from? Is it best to create wrinkles with a deformation tool under the skin, as was done when the giant green title character from Hulk (2003) flexed? As flesh bunches up, how do you simulate the under-skin blood flow, as when Gollum in The Lord of the Rings scrunches his face and his cheeks flush? Do you create a specialized texture map and animate changes in its transparency? Or do you write computer code that calculates the result of skin folding together and a rate of color change over time to determine where and when the flushing occurs? And when light hits the skin, how is it absorbed or dispersed? How translucent is the surface? If you have a lamp behind a digital ear, how much of the light shines through the skin? Can you pick up hints of a blood vessel there?
Making all the right choices to create a computer-generated figure that looks truly alive is no easy task. It probably will take a Leonardo da Vinci, or a team of da Vincis-computer-graphics artists talented enough to put it all together. When that happens, the movie or video game they're working on will become the Mona Lisa of the computer era.
About the Author
Eric Pavey is a lead-character technical director at Electronic Arts, in Redwood City, Calif. He has been involved in every aspect of the visual side of game development, creating animation, motion-capture, and scripting tools, as well as designing animated characters and their environments with a number of companies, including Square and Neversoft.
To Probe Further
To see IEEE Spectrum’s test of Mova’s motion-capture technology, go to www.spectrum.ieee.org.For more videos that demonstrate the technology, see http://www.mova.com. To see IEEE Spectrum’s test of Mova’s motion-capture technology, go to www.spectrum.ieee.org. For more videos that demonstrate the technology, see http://www.mova.com. Image Metrics shows off its facial animation technology at http://www.image-metrics.com. The Berkeley Technology Law Journal published a discussion of the legal issues of human animation in 1993. See law professor Joseph J. Beard’s ”Casting Call at Forest Lawn,” available at http://btlj.boalt.org/data/articles/8-1_spring-1993_beard.pdf. Later, Beard’s ”Clones, Bones, and Twilight Zones” addressed legal issues related to animating dead celebrities as well as living actors. See http://www.law.berkeley.edu/journals/btlj/articles/vol16/beard/beard.pdf.