Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
The technologies of face recognition have come a long way, but they were no help in finding the Boston Marathon bombers. In fact, by various accounts, the authorities didn’t even try, even though there were millions of images captured in Boston that day by closed-circuit TV systems at stores, banks, street intersections, and by spectators’ smartphones, cameras, and video cams.
What’s wrong with face recognition, and when will it finally help us identify and apprehend criminal suspects? On the other hand, when it gets that good, will it turn on its masters and be used to diminish the privacy and security of lawful citizens?
My guest today is James Wayman. He’s the former director of the National Biometric Test Center at San Jose State University and is now an administrator in its Office of Graduate Studies and Research. He holds four patents in speech processing and has helped develop national and international standards in biometrics. He joins us by phone.
Jim, welcome to the podcast.
James Wayman: Well, thank you very much. I appreciate being here.
Steven Cherry: News outlets reported that the images captured in Boston were of too poor a quality to be compared to a photo database. But as I understand it, that didn’t even matter. The FBI isn’t even set up to match individual photos against a database against them. Why is that?
James Wayman: Well, so, you’re exactly right. Why do images need to be high quality? Well, the state of the art, where we are in biometric facial-recognition matching, is that we do a very good job if we have full frontal facial images that are evenly lit, with a high amount of resolution, meaning a whole lot of pixels—hopefully at least 90 pixels between the eyes—and we have a completely uncluttered background. In fact, the standard refers to an 18 percent grayscale nonreflective background. So that’s the technology we’re fundamentally dealing with.
Secondly, as you point out, the FBI’s not even set up now to try to compare faces with that level of quality and resolution. Now, the FBI has announced that starting next year, they’re going to have a pilot project that will allow them to compare mug shots. Mug shots aren’t quite to the resolution level that I just mentioned. In other words, if you look at a mug shot coming out of a police department, it will not have an 18 percent nonreflective grayscale background. You’ll see all kinds of stuff in the background.
So the FBI’s saying, well, maybe starting next year we can have a pilot project that will allow us to compare mug shots, even though the quality of most mug shots is not real good.
Steven Cherry: Just this business of image quality, I guess that’s what prevented comparing the suspects’ photographs to, say, the Massachusetts driver’s license database, where I guess they both had driver’s licenses.
So let’s talk about how this works. There are a lot of strategies for comparing two facial images. The National Institute of Standards and Technology [NIST] has held some competitions and has a grand challenge for facial recognition. What seems to be working the best right now?
James Wayman: Yeah, I don’t want people to think somehow we’re finding the distance between the nose and the eyes and the nose and the mouth, because we can’t even find the mouth. There’s something down there. But we can find the eyes. The eyes, we got really lucky. God gave us eyes that have a dark-colored pupil against a white-colored sclera, and those are pretty distinctive. If your eyes are open, we can find those, and we can find the eye centers pretty well. But noses, not so much, and mouths certainly not. Mouths move too much, or mine does. And your chin kind of seems to fade into your neck. Remember, these images are all black and white.
Okay, so, let’s start historically with the technology. It was developed in the early 1960s by a fellow named Woodrow W. Bledsoe, who I believe was an IEEE member. He later retired at the University of Texas at Austin. And what he was doing was marking facial images by hand—the centers of the eyes, the corners of the eyes, the corners of the lips, and the like. And then he projected these marks onto a sphere and he rotated the sphere, trying to get marks from two different images to line up, at which point he could say, aha, these are from the same person.
Well, all of this hand marking didn’t work so well, and in the late 1980s, Sirovich and Kirby came out with this very simplistic idea that is so simple it sounds like it’ll never possibly work, but it did. And that is, we’re going to project the entire face image onto a series of filters. The filters themselves will be derived from a PCA [principal component analysis] to composition of a set of vectors created by another group of facial images.
Well, that approach didn’t work all that great. And one of the reasons is these filters are global filters, meaning all over each one of these basis functions, you have nonzero values. What that means is if someone changes their mouth, for instance, it impacts every single one of the projection coefficients—every one of them. Oh, that’s terrible.
So in, I think, about 1996, you had a Rockefeller University professor, says we’re going to fix that. And what we’re going to use for our basis vectors onto which we project these faces, we’re going to use what we call “local basis vectors,” meaning most of the basis vector is zero. So if you smile between the two pictures, one picture’s smiling and one picture’s not smiling, maybe only one or two of the coefficients in the representation is going to change. He called this “local feature analysis” because each one of these basis vectors only had a localized nonzero region. And that worked really, really well. And, in fact, that took us into the 2000s.
And then in 2000, under funding from the Office of Naval Research, a whole new approach was developed. And that was, what we’re going to do, is we’re going to take simply very, very small filters, technically speaking, Gabor filters, and we’re going to draw a grid on the face, and every place where the grid, this checkerboard, crosses in the face, we’re going to put down a series of Gabor filters, small Gabor filters on that area of the face, and we’re going to find out what coefficients we get out.
Then the next advance, that came maybe just five or six years ago, was to try to tie the grid to actual landmarks on the face. Now, we can’t find the nose exactly and the mouth exactly, but we’ve done a very, very good job in the last 10 years of finding eye centers pretty exactly. And for most people, the nose is midway between the two eye centers and down. I say for most people, because there are people that eye centers are not horizontally aligned, so that’s one failure mode. But for most people, we can guess where the nose might be, and we might look for changes in the black-white pattern between the eyes and down that would indicate, yeah, that’s sort of a nose, and if we go below that, we should get the mouth.
And then what you can do is, because facial expression changes, the illumination of the face changes, the pose angle of the face changes, you can warp these grids around a little bit to try to get the Gabor filter coefficients of two facial images to match up. And if you can get the coefficients to match up, you say, “Aha, this may be the same face.” If, despite your attempts at warping, you can’t get the facial image coefficients to match, you say, “Well, it’s probably not the same guy.”
So one more approach we need to talk about, and that’s the one that you might have thought of originally, and that is the local correlation. Maybe we can just take small face patches of one face and place it over another face and see if they kind of correlate and match up.
Now, all these methods are available, and I understand now from the facial-recognition companies that, depending on the resolution of the image available, they can actually apply all four methods simultaneously to determine the degree of correspondence, the degree of similarity between two facial images.
Steven Cherry: Good. So, I guess another problem holding back face recognition in law-enforcement situations is on the database side, right? The quality of those images. And then there’s yet another problem, also on the database side. It’s the too-much-of-a-good-thing problem, right? It’s impractical to compare an image against, say, every photo in Facebook, even though the images there are mostly pretty good images.
James Wayman: And you’re leaving out a third impediment, and that is legislative. For instance, I don’t know what authority the FBI would even have to access the driver’s license images from the State of California. I guarantee that they do not have authority to access the facial images stored in our social service welfare database.
Steven Cherry: Well, let’s suppose that weren’t an impediment, and I believe that it is an impediment now, and that there are efforts to remove that impediment. So let’s just talk about the practical matter of comparing a single image against a database of millions of photographs, say.
James Wayman: Okay. Well, I mean, it’s the obvious probabilistic problem, and that is, even minuscule false positive rates result in a few false positives over a very large database, right? So now, suppose the person you’re looking for actually is in the database. You get back that person’s face mixed with all the false positives. Suppose that person you’re looking for is not in the facial database. You still get about the same number of false positives. So, you spend most of your time looking at the false positives.
Steven Cherry: Right, which is something that sometimes happens for the FBI, right? They have to track down a thousand leads and one of them proves to be correct.
James Wayman: I suppose, but that’s not how the FBI does it. I mean, that’s really an impractical way to approach things. There’s a saying in this community that “one word is worth a thousand pictures.” You don’t have to look through a thousand pictures; that’s ridiculous. You want to just find the word. The word maybe is the guy’s driver’s license number or the guy’s address or the guy’s passport number or maybe even his name or something like that. And then get that, find that first. That may be a whole lot easier. And that way you don’t have to cull through all those pictures.
Steven Cherry: Now, what about the computational problem? How much time does it take to compare two photographs?
James Wayman: There’s an easy answer to that, and I’m sure it’s in the NIST test reports. I just don’t remember it. I mean, these numbers are commonly published, and they just go in one eye and out the other in me. I just can’t tell you. You know, it’s on the order of milliseconds, I’m sure. And, you know, you can parallelize that, right? And so you can have multiple computers. That’s not the issue. The issue is not the computational time. That can be handled through parallel computing.
Steven Cherry: We’ve seen a lot of areas where technology seems to be making very little progress for years, and then suddenly it takes off, right? Self-driving cars went—you should pardon the pun—from 0 to 60 in just the last few years, language translation, voice recognition. Do you think that’s likely to happen with face recognition?
James Wayman: Well, I don’t know that I accept the fundamental premise. Voice-recognition work, this is the work I was doing in the ’80s, both speech- and speaker-recognition work has progressed pretty uniformly for the 30 years I’ve been involved, meaning that it did get to a level where people could actually start using it, maybe a couple of years ago when Apple came out with Siri. It may just rise to the level where people can start using it. That doesn’t mean that the progress has in any way been uneven.
Now, I would say with regard to facial-recognition technology, the government dumped a ton of money into this technology after 9/11. And I worked for the government, helping them spend some of that money. I didn’t receive the money myself; I helped them allocate money to universities to do the research. And so when the money went into the technology, the technology improved greatly.
Right now, of course, we’ve cut back on our R&D money. The technology will not be improving as rapidly in the coming years, but it takes a while. There’s a phase lag there. It’ll take a while for us to figure that out, that the technology improved very, very rapidly in the 2000s and did not improve as rapidly in our decade because the amount of money being spent was minuscule compared to the previous decade.
Steven Cherry: Eventually, at some point, maybe a few years and maybe longer, but at some point this stuff is going to be really fantastic. And at that point, are we going to start to worry about incursions of our privacy, being too readily identified, and are we going to start regretting all those millions of photos we’ve put on Facebook, for example?
James Wayman: I think it’s interesting you should bring that question up in the context of biometrics. I mean, don’t we already have that problem? People carry around these personal transmitter devices called cellphones, right? And those numbers are pretty identifying. Nobody but me carries my cellphone, and the cellphone transmits, however many seconds, its phone number to whatever tower is hanging around. And the potential for invading my privacy is much, much stronger with things like my cellphone or my Facebook account or my e-mail account than it is for using facial-recognition and surveillance applications. For me, that’s a nonstarter with regard to privacy. That’s not the issue; the issue is things like cellphones.
Steven Cherry: Fair enough. Well, Jim, it’s a potentially fabulously useful technology, and I guess maybe that is fearfully so, as I might have thought, given what you have to say about cellphones. So, thanks for joining us today and telling us about it.
James Wayman: Well, thank you very much. I enjoyed talking to you.
Steven Cherry: We’ve been speaking with biometric researcher Jim Wayman about the current limitations—and future prospects—of face recognition.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
This interview was recorded Thursday, 16 May 2013.
Segment producer: Barbara Finkelstein; audio engineer: Francesco Ferorelli
Image: Julien Tromeur/iStockphoto
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.
To Probe Further
Face Recognition Failed to Find Boston Bombers Meanwhile, the NYPD’s Domain Awareness System uses cameras in a more promising way
The Future of Riots Video surveillance of London’s rioters points to future of facial recognition
Technology Is Still Easily Foiled by Cosmetic Surgery In the first test of face-recognition technology versus cosmetic surgery, face recognition loses
Here's Looking at You, and You, and You... A Massachusetts man is falsely singled out by an automated antiterrorism facial-recognition system