Helen of Troy may have had the face that launched a thousand ships, but even the best facial recognition algorithms might have had trouble finding her in a crowd of a million strangers. The first public benchmark test based on 1 million faces has shown how facial recognition algorithms from Google and other research groups around the world still fall well short of perfection.
Facial recognition algorithms that had previously performed with more than 95 percent accuracy on a popular benchmark test involving 13,000 faces saw significant drops in accuracy when taking on the new MegaFace Challenge. The best performer, Google’s FaceNet algorithm, dropped from near-perfect accuracy on the five-figure data set to 75 percent on the million-face test. Other top algorithms dropped from above 90 percent to below 60 percent. Some algorithms made the proper identification as seldom as 35 percent of the time.
“MegaFace’s key idea is that algorithms should be evaluated at large scale,” says Ira Kemelmacher-Shlizerman, an assistant professor of computer science at the University of Washington, in Seattle, and the project’s principal investigator. “And we make a number of discoveries that are only possible when evaluating at scale.”
The huge drops in accuracy when scanning a million faces matter because facial recognition algorithms inevitably face such challenges in the real world. People increasingly trust these algorithms to correctly identify them in security verification scenarios, and law enforcement may also rely on facial recognition to pick suspects out of the hundreds of thousands of faces captured on surveillance cameras.
The most popular benchmark until now has been the Labeled Faces in the Wild (LFW) test created in 2007. LFW includes 13,000 images of just 5,000 people. Many facial recognition algorithms have been fine-tuned to the point that they scored near-perfect accuracy when picking through the LFW images. Most researchers say that new benchmark challenges have been long overdue.
“The big disadvantage is that [the field] is saturated—that is, there are many, many algorithms that perform above 95 percent on LFW,” Kemelmacher-Shlizerman says. “This gives the impression that face recognition is solved and working perfectly.”
With that in mind, University of Washington researchers raised the bar by creating the MegaFace Challenge using 1 million Flickr images of 690,000 unique faces that are publicly available under a Creative Commons license.
The MegaFace Challenge forces facial recognition algorithms to do verification and identification, two separate but related tasks. Verification involves trying to correctly determine whether two faces presented to the facial recognition algorithm belong to the same person. Identification involves trying to find a matching photo of the same person among a million “distractor” faces. Initial results on algorithms developed by Google and four other research groups were presented at the IEEE Conference on Computer Vision and Pattern Recognition on 30 June. (One of MegaFace’s developers also heads a computer vision team at Google’s Seattle office.)
The results presented were a mix of the intriguing and the expected. Nobody was surprised that the algorithms’ performances suffered as the number of distractor faces increased. And the fact that algorithms had trouble identifying the same person at different ages was a known problem. However, the results also showed that algorithms trained on relatively small data sets can compete with those trained on very large ones, such as Google’s FaceNet, which was trained on more than 500 million photos of 10 million people.
For example, the FaceN algorithm from Russia’s N-TechLab performed well on certain tasks in comparison with FaceNet, despite having trained on 18 million photos of 200,000 people. The SIAT MMLab algorithm, created by a Chinese team under the leadership of Yu Qiao, a professor with Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, also performed well on certain tasks.
Nevertheless, FaceNet has so far performed the best overall. It delivered the most consistent performance across all testing.
Just seeing how Google’s algorithm stacks up against those of competitors may be the challenge’s most valuable result, says Stefanos Zafeiriou, a computer vision expert at Imperial College London. He and other researchers not involved with the MegaFace Challenge were impressed by the fairly consistent performance of FaceNet. On the other hand, its accuracy of 75 percent shows that even the best facial recognition algorithms may have trouble identifying faces on a “world scale” with millions or billions of distractor faces.
And, MegaFace could provide a focal point for future research efforts. Until now, most academic research groups have focused on improving their algorithms by using larger and larger training data sets rather than challenging them with larger benchmark data sets, says Jonathon Phillips, an engineer at the National Institute of Standards and Technology, in Washington, D.C.
The University of Washington researchers plan to release a training data set based on MegaFace photos for use by any researcher. That would help even the smallest academic teams marshal some of the resources already available to Silicon Valley giants.
“The more we can look at how these algorithms perform on giant data sets that are more characteristic of the images people have in their phones, the better,” says Ross Beveridge, a computer scientist at Colorado State University, in Fort Collins.