In 2009 alone, the U.S. Air Force shot 24 years' worth of video over Iraq and Afghanistan using spy drones. The trouble is, there aren't enough human eyes to watch it all.
The deluge of video data from these unmanned aerial vehicles, or UAVs, is likely to get worse. By next year, a single new Reaper drone will record 10 video feeds at once, and the Air Force plans to eventually upgrade that number to 65. John Rush, chief of the Intelligence, Surveillance and Reconnaissance Division of the U.S. National Geospatial-Intelligence Agency, projects that it would take an untenable 16 000 analysts to study the video footage from UAVs and other airborne surveillance systems.
The best—and perhaps only—way forward is to have a computer watch it all. But programming a system to automatically search video and pick out noteworthy information is not an easy problem. And so far, no one has developed software that can keep up with the military's high-tech hardware.
There were some glimmers of hope, however, at August's 7th IEEE International Conference on Advanced Video and Signal-Based Surveillance, in Boston. Mubarak Shah, director of the Computer Visions Lab at the University of Central Florida, in Orlando, identified three computer surveillance tasks that are notoriously difficult.
The first task involves tracking big swarms of objects, such as cars, traveling over a wide area, like an expressway. When shot from above, cars are exceedingly small (usually no more than 30 pixels), and there are often thousands of them. Plus, the plane that's shooting footage is moving faster than the vehicles it's capturing, so a tracking algorithm has only a few frames to work with for each car. And if a new technology developed under the direction of the Defense Advance Research Projects Agency, called ARGUS-IS, succeeds, a single drone-mounted video sensor and processor could easily "capture one and a half to 2 million vehicles [within a 40-kilometer radius] during one mission," says Rush.
Shah's solution depends on keeping track of all the possible paths a vehicle may have taken, then weeding out the poor choices. The computer does that by using some common sense: It knows two vehicles probably won't choose a collision course, for example. And it uses a bit of modern transportation theory as well: If one car is behind another, the two cars are probably accelerating at a similar rate.
Finding a person can be even harder than keeping track of cars. In aerial video, typical statistics-based algorithms mistake quite a lot of things—trees, mailboxes, traffic lights—for people. Instead, Shah proposes using some basic geometry tricks to find a person, based on the relationship between the height of an object and the length of its shadow. He admits, though, that this strategy wouldn't work so well on video shot on cloudy days or using infrared light.
The final challenge Shah addressed is how to map the movement patterns of many things at once. Say, for example, you have some aerial footage of a city and you want to figure out how it's laid out—where the roads are, the bridges, the intersections, where people regularly travel, the areas they avoid, where they gather. What a computer sees in a surveillance video is "very noisy optical flow," Shah says—lots of motion but not much order. But, using mathematical noise-reducing tools called Gaussian filters, Shah can find order in the noise and get his software to draw a picture of the city in motion. "We can basically discover the road networks without knowing anything about a city," he says.
But it may be a while before such technologies are accurate and usable enough to be adopted broadly. Military intelligence analysts "will use systems put in front of them now, then turn them off because it just makes their job harder," Rush told engineers in Boston. "Getting them to accept the results [of automatic video-search software] without going back and checking all the data—that's a long time coming."
This article originally appeared in print as "Eyes in the Sky That See Too Much".