PHOTO CREDIT: Hustvedt, Wikimedia Commons
On Monday morning, Sadiye Guler, founder and president of the video analytics company intuVision, was telling a story to a small gathering of software engineers in a conference room at Boston University in Massachusetts. She was saying that her company had recently gotten its person-identifying software to work quite well. The software could estimate a videotaped person’s age, gender, and ethnicity with pretty good reliability. Then just last month, intuVision presented its products at the U.S. Joint Forces Command’s annual Empire Challenge—a kind of showcase for new surveillance and reconnaissance technologies. This year, the challenge was held in the desert state of Arizona.
“Everyone was wearing hats with big rims and sunglasses, and the lighting contrast was intense—everyone’s faces were showing up half light, half dark,” Guler said. Needless to say, intuVision’s engineers cut short the people-identifier portion of their software demonstration.
“Sounds like demo syndrome,” one of the engineers commented.
“Demo syndrome” is a term I’ve been hearing a lot here at the 7th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS). Another example: On Tuesday morning, Mahesh Saptharishi, chief scientist for VideoIQ, Inc.—which claims to be “the inventor of the world’s first and only intelligent surveillance cameras and encoders with built-in video recording”—was showing me some of the things a client of VideoIQ could do with its video search software.
“Suppose you want to see all the clips that show a white car like this one,” Saptharishi said, pointing to a white station wagon entering a parking lot on a computer screen. The VideoIQ software had previously determined that the wagon was a moving vehicle, drawn a red tracking box around it, and archived the clip. Saptharishi clicked the “look-for-all-cars-like-this-one” button and waited. The search timed out. He tried again using a black SUV. The first result he called up showed a light blue minivan. The second showed a gray station wagon.
“That’s demo syndrome for you,” an observer said.
As a phenomenon, “demo syndrome” encompasses much more than the unexpected glitches that happen exactly when you’re trying to show off your technology to a journalist or a couple hundred colleagues. It represents the gap between what store owners, police departments and Air Force intelligence analysts expect surveillance technology to do and what today’s computer vision programs actually offer. All too often, engineers have just a brief window of time to convince clients that automatic surveillance technologies are actually making their jobs easier. It doesn’t take much for an video analyst to throw up his hands and go back to the way he’s always analyzed video footage: with his own eyes.
“[Air Force intelligence analysts] will use systems put in front of them now, then turn them off because it just makes their job harder,” explained John Rush, chief of the Information Integration Data Engineering Division of the U.S. National Geospatial-Intelligence Agency, during a presentation Tuesday afternoon. “Getting them to accept the results [of automatic video search software] without going back and checking all the data—that’s a long time coming.”
Rush is leading the charge to get the U.S. Air Force and its funders to change the way they think about processing data. As I addressed in Wednesday’s post, the Air Force is upgrading its camera and sensor systems to the point where there’s just too much data coming in for analysts to sort through. For example, Rush mentioned that DARPA’s next-generation ARGUS-IS—a drone-mounted video sensor and processor—will be able to survey a 40-square kilometer area at 1.8 gigapixel resolution. “We’re talking about being able to capture one and a half to two million vehicles in that area during one mission,” Rush said. “You’d need 16,000 analysts based on the projected data coming out of these systems.”
You’d be hard-pressed to find any user of surveillance hardware who thinks automatic detection and tracking software is a bad idea. Barbara Shaw, a project leader for the National Institute of Standards and Technology, who participated in AVSS’s industry panel on Tuesday, recalled describing new advances in surveillance software to the vice president of security for a Las Vegas casino, which to Shaw’s surprise, relied solely on human eyeballs to monitor its cameras. Shaw recalled the VP responding: “This is exactly what we need! When can we get the technology?”
The problem, as with most scientific pursuits, is that computer vision technology advances slowly. And it seems, at the moment anyway, that the software is rarely as good as users expect it to be. Perhaps that’s because what we have to compare it to is one of the best vision systems on Earth: our own.
“A two-year-old knows that’s a trash can over there,” said Guler of intuVision, pointing to a gray plastic bin against a gray wall. “People think it’s an easy thing to do, identifying a trash can. ‘You mean your program can’t even see that?’ they ask me. The truth is, a computer isn’t going to perform like a human next month or next year.”
Still, at this conference alone, I’ve seen lots of impressive algorithms—algorithms that can re-identify a person in infrared or detect a pedestrian crossing a freeway from two kilometers above the ground. The question is: can they do these things when it counts?