23 March 2011—It was a rainy August morning in Twinsburg, Ohio, but the Twins Days Festival 2009 was getting off to a grand start. University of Notre Dame professors Kevin W. Bowyer and Patrick J. Flynn took their places in their research tent as people began to wander, in duplicate, across the festival grounds. While this annual gathering of twins welcomes fraternal twins too, the high number of identical twins who congregate every year in Twinsburg made it an ideal location for these scientists to put face-recognition software to the ultimate test.
"[The festival] was a fascinating event from a research perspective," says Flynn. "For the festival, twins go to great pains to look as identical as it’s possible to look. The women told us they picked out their outfits and jewelry months in advance." All that planning makes for cute pictures, but it also sets up quite a challenge: Can the best state-of-the-art face-recognition programs distinguish between two genetically identical people who are trying to be indistinguishable?
It’s more than an academic question. Bowyer and Flynn’s data collection was funded by the U.S. Federal Bureau of Investigation, which has more than a passing interest in improving automated face-recognition systems. Such systems are already employed by law-enforcement agencies to aid with investigations, but they aren’t yet sophisticated enough or reliable enough to produce evidence that would stand up in court.
It’s rare, of course, that law-enforcement agencies would need to tell the difference between a law-abiding person and his evil twin. That isn’t the reason the FBI funded Bowyer and Flynn’s research. "Identical twins represent a real torture test for biometrics," explains Flynn. "If you can develop techniques that work on twins, you might have a more robust system that works better in general."
Bowyer and Flynn analyzed the data in collaboration with colleagues at the University of Notre Dame, the National Institute of Standards and Technology, and the FBI. They revealed the results of their Twinsburg experiments yesterday at the IEEE International Conference on Automatic Face and Gesture Recognition. The score so far: twins, 1, face recognition, 0.
At the 2009 festival, the two researchers snapped photos of 126 pairs of twins both under studio lights and outside in ambient light. In some photos the twins were asked to smile; in some they maintained neutral expressions. Bowyer and Flynn then unleashed three of the face-recognition programs that performed best in an evaluation run by the National Institute of Standards and Technology in 2010. That federal agency routinely invites companies and universities to submit their biometric systems for testing against a massive government data set of mug shots, visa photos, and the like. (It’s helpful to test programs against the government’s data because "every company does great on its own internal data set," Flynn says.)
When the programs were fed only photos taken under ideal conditions—twins wearing neutral expressions under studio lights—they did fairly well at matching pictures of the same person. Their success was judged by the "equal error rate," the point at which the program is just as likely to make a false positive match as it is to make a false negative mismatch. Under ideal conditions, all three programs had an equal error rate of less than 5 percent.
These results suggest that identical twins don’t entirely confound face-recognition software. "Even identical twins are distinguishable, but it’s a collection of minute differences: a mole here, a crooked tooth there," says Bowyer. "You can only tell the difference between the two faces if you have reliable detection of fine details." To get the best results, Bowyer says face-recognition programs should use higher-resolution images than they typically do now and should focus on smaller details. Many of today’s biometric systems examine more holistic markers of individuality, like the relationship between features on a face.
When the researchers tested the three programs under circumstances that more closely resembled real-world conditions, the results were much worse. Comparing pictures of smiling and neutral faces often caused matching errors, and comparing photos taken under studio lighting and ambient lighting also confused the programs.
To underline the limitations of today’s software, Bowyer and Flynn returned to the Twins Days Festival in 2010 and found 24 pairs of twins whom they’d photographed the previous year. By taking a new round of pictures, they were able to test the software using photos of people taken a year apart. To create an even more realistic scenario, they compared the 2009 studio photograph with a 2010 photo taken outdoors in ambient light. "It’s as if you have a nice indoor studio image of a man, and then you just happen to catch a shot of him a year later as he’s walking down the street," explains Bowyer. "In that circumstance, the best-performing algorithm had about a 17 or 18 percent equal error rate. That’s awful for almost any application."
The research clearly indicates that biometric technology has a way to go before it can reliably deal with the likes of the Doublemint Twins. But when a face-recognition system does ace a test like Bowyer and Flynn’s, it will have earned some serious bragging rights. As Bowyer puts it, "They’ll be able to say, 'This is so good, it can tell identical twins apart.' "
This article was updated on 25 March 2011.