The December 2022 issue of IEEE Spectrum is here!

Close bar

One Million Faces Challenge Even the Best Facial Recognition Algorithms

A million-face benchmark test shows that even Google's facial recognition algorithm suffers in accuracy

3 min read
Photos: University of Washington
Photos: University of Washington

Helen of Troy may have had the face that launched a thousand ships, but even the best facial recognition algorithms may have had trouble finding her face in a crowd of one million strangers. The first benchmark test based on one million faces has shown how facial recognition algorithms from Google and other research groups around the world can still fall short in accurately identifying and verifying faces.

Facial recognition algorithms that had previously performed with more than 95 percent accuracy on a popular benchmark test involving 13,000 faces saw significant drops in accuracy when faced with the new MegaFace Challenge  involving one million faces. The best performer on one test, Google’s FaceNet algorithm, dropped from near-perfect accuracy on five-figure datasets to 75 percent on the million-face test. Other top algorithms dropped from above 90-percent accuracy on the small datasets to below 60 percent on the MegaFace Challenge. Some algorithms made the proper identification as seldom as 35 percent of the time.

“Megaface's key idea is that algorithms should be evaluated at large scale,” says Ira Kemelmacher-Shlizerman, an assistant professor of computer science at the University of Washington in Seattle and the project’s principal investigator. “And we make a number of discoveries that are only possible when evaluating at scale.”

The huge drops in accuracy when scanning a million faces matters because facial recognition algorithms inevitably face such challenges in the real world. People increasingly trust facial recognition algorithms to correctly identify them in security verification scenarios for automatically unlocking smartphones or entering workplaces. Law enforcement officials may also rely on facial recognition algorithms to find the correct match to a single photo of a suspect among hundreds of thousands of faces captured on surveillance cameras.

The most popular benchmark test until now has been the Labeled Faces in the Wild (LFW) test created in 2007. LFW includes 13,000 images of just 5,000 people. Many facial recognition algorithms have been fine-tuned to the point that they scored near-perfect accuracy when picking through that smaller set of images, which may have created a false sense of confidence about the state of facial recognition.

“The big disadvantage is that [the field] is saturated, i.e. there are many, many algorithms that perform above 95 percent on LFW,” Kemelmacher-Shlizerman explains. “This gives the impression that face recognition is solved and working perfectly.” 

With that in mind, University of Washington researchers raised the bar by creating the MegaFace Challenge using one million Flickr images that are publicly available under a Creative Commons license. The MegaFace dataset includes one million images featuring 690,000 unique faces.

The MegaFace Challenge forces facial recognition algorithms to do verification and identification, two separate but related tasks. Verification involves trying to correctly determine whether two faces presented to the facial recognition algorithm belong to the same person. Identification involves trying to find a matching photo of the same person among a million “distractor” faces. Initial results appear in a paper that was presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) on 30 June.

Testing on such an unprecedented scale has revealed some intriguing results. Some algorithms that had trained on relatively small datasets still performed favorably compared with those fielded by giants such as Google (which trained its algorithm on more than 500 million photos of 10 million people). Experts have long since shown that algorithms may perform poorly on benchmark tests involving smaller datasets despite having trained on far larger pools of images.

For example, the FaceN algorithm from Russia’s N-TechLab performed favorably in comparison to Google’s FaceNet algorithm despite having trained on 18 million photos of 200,000 people. The SIAT MMLab algorithm, created by a Chinese team under the leadership of Yu Qiao, a professor with Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, also performed well.

The MegaFace Challenge testing also showed that facial recognition algorithms still have trouble grouping photos of a single individual at different ages. Children also present a bigger recognition challenge than adults do because of a lack of dataset photos needed to train the algorithms.

The MegaFace Challenge will likely go a long way toward improving the state of facial recognition. The University of Washington researchers plan to release a huge training dataset that will be available for use by any research team. That would help even the smallest academic teams marshal the resources already available to Silicon Valley giants such as Google and Facebook.

“It's a huge problem because open research and competition cannot be done if researchers cannot train their algorithm on similar data as some companies,” Kemelmacher-Shlizerman says. “There is no opportunity to come up with better techniques.”

The University of Washington researchers have already begun working with more than 300 research groups. They plan to post ongoing results from the MegaFace Challenge at the project’s website.

The Conversation (0)

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
A plate of spaghetti made from code
Shira Inbar

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less