Is It Live or Is It AR?
By blending digital creations with our view of the world, augmented reality is set to transform the way we entertain and educate ourselves
There are two ways to tell the tale of one Sarah K. Dye, who lived through the Union Army's siege of Atlanta in the summer of 1864. One is to set up a plaque that narrates how she lost her infant son to disease and carried his body through Union lines during an artillery exchange, to reach Oakland Cemetery and bury him there.
The other is to show her doing it.
You'd be in the cemetery, just as it is today, but it would be overlaid with the sounds and sights of long ago. A headset as comfortable and fashionable as sunglasses would use tiny lasers to paint high-definition images on your retina--virtual images that would blend seamlessly with those from your surroundings. If you timed things perfectly by coming at twilight, you'd see flashes from the Union artillery on the horizon and a moment later hear shells flying overhead. Dye's shadowy figure would steal across the cemetery in perfect alignment with the ground, because the headset's differential GPS, combined with inertial and optical systems, would determine your position to within millimeters and the angle of your view to within arc seconds.
That absorbing way of telling a story is called augmented reality, or AR. It promises to transform the way we perceive our world, much as hyperlinks and browsers have already begun to change the way we read. Today we can click on hyperlinks in text to open new vistas of print, audio, and video media. A decade from now--if the technical problems can be solved--we will be able to use marked objects in our physical environment to guide us through rich, vivid, and gripping worlds of historical information and experience.
The technology is not yet able to show Dye in action. Even so, there is quite a lot we can do with the tools at our disposal. As with any new medium, there are ways not only of covering weaknesses but even of turning them into strengths--motion pictures can break free of linear narration with flashbacks; radio can use background noises, such as the sound of the whistling wind, to rivet the listener's attention.
Along with our students, we are now trying to pull off such tricks in our project at the Oakland Cemetery in Atlanta. For the past six years, we have held classes in AR design at the Georgia Institute of Technology, and for the past three we have asked our students to explore the history and drama of the site. We have distilled many ideas generated in our classes to create a prototype called the Voices of Oakland, an audio-only tour in which the visitor walks among the graves and meets three figures in Atlanta's history. By using professional actors to play the ghosts and by integrating some dramatic sound effects (gunshots and explosions during the Civil War vignettes), we made the tour engaging while keeping the visitors' attention focused on the surrounding physical space.
We hope to be able to enhance the tour, not only by adding visual effects but also by extending its range to neighboring sites, indoors and out. After you've relived scenes of departed characters in the cemetery, you might stroll along Auburn Avenue and enter the former site of the Ebenezer Baptist Church. Inside, embedded GPS transceivers would allow the GPS to continue tracking you, even as you viewed a virtual Reverend Martin Luther King Jr. delivering a sermon to a virtual congregation, re-creating what actually happened on that spot in the 1960s. Whole chapters of the history of Atlanta, from the Civil War to the civil rights era, could be presented that way, as interactive tours and virtual dramas. Even the most fidgety student probably would not get bored.
By telling the story in situ, AR can build on the aura of the cemetery--its importance as a place and its role in the Civil War. The technology could be used to stage dramatic experiences in historic sites and homes in cities throughout the world. Tourists could visit the beaches at Normandy and watch the Allies invade France. One might even observe Alexander Graham Bell spilling battery acid and making the world's first telephone call: ”Mr. Watson, come here.”
The first, relatively rudimentary forms of AR technology are already being used in a few prosaic but important practical applications. Airline and auto mechanics have tested prototypes that give visual guidance as they assemble complex wiring or make engine repairs, and doctors have used it to perform surgery on patients in other cities.
But those applications are just the beginning. AR will soon combine with various mobile devices to redefine how we approach the vast and growing repository of digital information now buzzing through the Internet. The shift is coming about in part because of the development of technologies that free us from our desks and allow us to interact with digital information without a keyboard. But it is also the result of a change in attitude, broadening the sense of what computers are and what they can do.
We are already seeing how computers integrate artificially manipulated data into a variety of workaday activities, splicing the human sensory system into abstract representations of such specialized and time-critical tasks as air traffic control. We have also seen computers become a medium for art and entertainment. Now we will use them to knit together Web art, entertainment, work, and daily life.
Think of digitally modified reality as a piece of a continuum that begins on one end with the naked perception of the world around us. From there it extends through two stages of ”mixed reality” (MR). In the first one, the physical world is like the main course and the virtual world the condiment--as in our AR enhancement of the Oakland Cemetery. In the other stage of MR, the virtual imagery takes the spotlight. Finally, at the far end of the continuum lies nothing but digitally produced images and sounds, the world of virtual reality.
Any AR system must meld physical reality with computer-modeled sights and sounds, a display system, and a method for determining the user's viewpoint. Each of the three components presents problems. Here we will consider only the visual elements, as they are by far the most challenging to coordinate with real objects.
The ability to model graphics objects rapidly in three dimensions continues to improve because the consumer market for games--a US $30-billion-a-year industry worldwide--demands it. The challenge that remains is to deliver the graphics to the user's eyes in perfect harmony with images of the real world. It's no mean feat.
The best-known solution uses a laser to draw images on the user's retina. There is increasing evidence that such a virtual retinal display can be done safely [see ”In the Eye of the Beholder,” IEEE Spectrum , May 2004]. However, the technology is not yet capable of delivering the realistically merged imagery described here. In the meantime, other kinds of visual systems are being developed and refined.
Most AR systems use head-worn displays that allow the wearer to look around and see the augmentations everywhere. In one approach, the graphics are projected onto a small transparent screen through which the viewer sees the physical world. This technology is called an optical see-through display. In another approach, the system integrates digital graphics with real-world images from a video camera, then presents the composite image to the user's eyes; it's known as a video-mixed display. The latter approach is basically the same one used to augment live television broadcasts--for example, to point out the first-down line on the field during a football game [see ”All in the Game,” Spectrum , November 2003].
This comparison with augmented-live television highlights the problems that must still be solved. TV broadcasters can fix their cameras in precisely known positions and track their orientation with high-quality built-in encoders. And they can delay the video signal by a few dozen frames to gain time to clean things up. Because millions of people are watching, it makes economic sense for the television broadcaster to employ a team of technicians to monitor and adjust the system. Whoever wishes to bring AR to museums and historic landmarks--let alone less-traveled paths--will have to find less expensive ways around such problems.
The biggest technological challenge is to track position and orientation. Just how good the tracking must be depends, of course, on what you want to do with it. In the Oakland Cemetery example, it would be acceptable to place the ghosts within, say, 10 centimeters of their graves. However, a mechanic depending on AR to replace tiny components in a jet engine would need greater precision. The system might indicate the tiny components by highlighting them in a color; if they are just a few millimeters wide, clearly the system must have millimeter-level accuracy. Distance is just as important--the farther away you look, the more an error in the angle of the line of vision will become obvious.
For the display to have a chance of appearing perfectly aligned, the orientation error must be less than the visual angle of one pixel on the display. A typical display today might have a field of view of 24 degrees and a horizontal resolution of 800 pixels, meaning that an orientation error greater than 0.03 degree would result in perceptible misalignment between the virtual and physical objects.
To track things outdoors over a wide area, orientation sensors typically use magnetometers, inclinometers, and inertial sensors. The magnetic components can, however, be thrown off by the presence of magnetic fields, iron, or other ferric material. In smaller areas that can be surveyed or fitted with an infrastructure--fixed antennas, printed markers, and the like--the absolute accuracy of the sensors can be excellent.
A major research goal is to dispense with such an embedded infrastructure by devising automatic ways to find and track ”natural features”--say, an uncataloged tree or boulder. That way, the system could handle whatever comes up, without any prior knowledge of the territory. Particularly promising are technologies that combine wearable cameras with inertial sensors.
It is just as important to develop easy-to-use tools for AR. Without them, designers are not likely to enter the field. For our work on the Oakland Cemetery project, we used a programming system, created in the Augmented Environments Lab at Georgia Tech, called DART (Designer's Augmented Reality Toolkit). DART was built to facilitate rapid prototyping, so that designers can quickly visualize and test their ideas. We believe that DART can help contribute to the development of AR as a medium for art and creative design.
DART provides extensions to the Adobe Director multimedia-authoring system that allow it to coordinate three-dimensional objects, video, sound, and tracking information--the entire AR experience. It can track marked objects in a live video feed and react to real-time data streaming in from sensors, a wide variety of which can be made to work together seamlessly through the Virtual Reality Peripheral Network, an open-source system developed at the University of North Carolina, Chapel Hill. The VRPN also makes it easy to integrate DART programs with programs written in other languages.
DART has palettes of behaviors--that is, the actions of a computerized system as it responds to stimuli, as when a video camera follows a person's movements. It is not our intention to provide a collection of behaviors so complete that it would satisfy the needs of all AR application designers; such an effort would be doomed to failure. Rather, we have designed the behaviors to provide a modular and extensible framework that designers can easily appropriate for their own needs. Anyone developing a new AR application can edit the DART behaviors.
We are by no means the first to promote this combination of techniques as a new medium of expression. Designers and artists have been experimenting with precursors of the idea for years, although without using fully developed tracking technologies or head-worn displays. Since its founding in 1979, the Ars Electronica festival, in Linz, Austria, has featured digital artists such as Myron Krueger, whose work has involved embedding computer monitors in art installations or projecting images on large screens or the surfaces of rooms or buildings, often in real time. The Canadian installation artist Janet Cardiff has created a series of audio tours in which the user wears headphones and walks along a predetermined path as Cardiff's voice fashions an audio landscape.
In addition, curators and designers have been moving toward mixed and augmented reality as they seek to enhance the visitor's experience in museums, historic sites, and theme parks. One famous example is the audio tour of Alcatraz prison in San Francisco Bay, in which the user wears headphones and embarks on an evocative walk through the empty cells and hallways, accompanied by a reconstruction of the sounds and voices of 50 years ago. However, the tour is a linear experience: the user must follow the path dictated on the CD; there is no tracking of the user's location.
Some of the most compelling work uses mobile phones to combine Internet-based applications with the physical and social spaces of cities. Many such projects exploit the phone's GPS capabilities to let the device act as a navigational beacon. The positional information might let the phone's holder be tracked in cyberspace, or it might be used to let the person see, on the phone's little screen, imagery relevant to the location.
Blast Theory, an experimental art and technology group in Brighton, England, has been one of the leaders in such enterprises. Its participatory game event Can You See Me Now? --designed in collaboration with the Mixed Reality Lab at the University of Nottingham--pitted online participants against runners in the streets of a real city. In one installation, in the center of Sheffield, the runners carried handheld computers that showed them the same map that the online participants had in front of them; the computers also bore GPS receivers that let the online people follow along. The runners tried to reach points in Sheffield that corresponded to the virtual positions of as many online participants as possible, thereby ”catching” them. An open-mike audio channel connected the runners to the online players, giving the online players a sense of being in a shared physical space, no matter how far from Sheffield--or even England--they really were.
Meanwhile, new phones are coming along with processors and graphics chips as powerful as those in the personal computers that created the first AR prototypes a decade ago. Such phones will be able to blend images from their cameras with sophisticated 3-D graphics and display them on their small screens at rates approaching 30 frames per second. That's good enough to offer a portal into a world overlaid with media. A visitor to the Oakland Cemetery could point the phone's video camera at a grave (affixed with a marker, called a fiducial) and, on the phone's screen, see a ghost standing at the appropriate position next to the grave.
Video and computer games have been the leading digital entertainment technology for many years. Until recently, however, the games were entirely screen-based. Now they, too, are climbing through mobile devices and into the physical environment around us, as in an AR fishing game called Bragfish , which our students have created in the past year. Players peer into the handheld screens of game devices and work the controls, steering their boats and casting their lines to catch virtual fish that appear to float just above the tabletop. They see a shared pond, and each other's boats, but they see only the fish that are near enough to their own boats for their characters to detect.
We can imagine all sorts of casual games for children and even for adults in which virtual figures and objects interact with surfaces and spaces of our physical environment. Such games will leave no lasting marks on the places they are played. But people will be able to use AR technology to record and recall moments of social and personal engagement. Just as they now go to Google Maps to mark the positions of their homes, their offices, their vacations, and other important places in their lives, people will one day be able to annotate their AR experience at the Oakland Cemetery and then post the files on something akin to Flickr and other social-networking sites. One can imagine how people will produce AR home movies based on visits to historic sites.
Ever more sophisticated games, historic tours, and AR social experiences will come as the technology advances. We represent the possibilities in the form of a pyramid, with the simplest mobile systems at its base and fully immersive AR on top. Each successive level of technology enables more ambitious designs, but with a smaller potential population of users. In the future, however, advanced mobile phones will become increasingly widespread, the pyramid will flatten out, and more users will have access to richer augmented experiences.
Fully immersive AR, the goal with which we began, may one day be an expected feature of visits to historic sites, museums, and theme parks, just as human-guided tours are today. AR glasses and tracking devices will one day be rugged enough and inexpensive enough to be lent to visitors, as CD players are today. But it seems unlikely that the majority of visitors will buy AR glasses for general use as they buy cellphones today; fully immersive AR will long remain a niche technology.
On the other hand, increasingly ubiquitous mobile technology will usher in an era of mixed reality in which people look at an augmented version of the world through a handheld screen. You may well pull information off the Web while walking through the Oakland Cemetery or along Auburn Avenue, sharing your thoughts as well as the ambient sounds and views with friends anywhere in the world.
At the beginning of the 20th century, when Kodak first sold personal cameras in the tens of thousands, the idea was to build a sort of mixed reality that blended the personal with the historic (”Here I am at the Eiffel Tower”) or to record personal history (”Here's the bride cutting the cake”). AR will put us in a kind of alternative history in which we can live through a historic moment--the Battle of Gettysburg, say, or the ”I have a dream” speech--in a sense making it part of our personal histories.
Mobile mixed reality will call forth new media forms that skillfully combine the present and the past, historical fact and its interpretation, entertainment and learning. AR and mobile technology have the potential to make the world into a stage on which we can be the actors, participating in history as drama or simply playing a game in the space before us.
About the Authors
JAY DAVID BOLTER and BLAIR MACINTYRE are professors at the Georgia Institute of Technology, in Atlanta. Bolter has a Ph.D. in the humanities and is codirector of the Wesley Center for New Media; he teaches in the School of Literature, Communication, and Culture. MacIntyre, who holds a Ph.D. in computer science, is the director of the Augmented Environments Lab; he teaches in the College of Computing.