The big news out of Hot Chips on Monday was Google's promise to have its Goggles visual search app ready for the iPhoneby the end of 2010. Google Goggles project lead David Petrous also provided the inside scoop into how Goggles deciphers your images in the cloud. But the most interesting takeaway from Petrous' talk was his repeated insistence that Google Goggles does not do facial recognition—interspersed by a long tutorial on how well it would work if it did.
Augmenting your reality
Augmented reality is a step toward intuitive search, like having an insightful personal assistant following your every move, answering not just "what am I looking at?" but intuiting exactly what you want to know about it and why. For a machine, contextualizing and anticipating what you actually want is pretty difficult. Heck, it’s no picnic for a human. With that in mind, pointing your Android phone at the Eiffel Tower is pretty straightforward because there are only so many actions associated with that. 1) Here’s what you’re looking at. 2) Here’s some historical and technical information about the Eiffel Tower. 3) Here are directions to there from where you are standing.
It gets harder when you're pointing at something ambiguous. Petrous demonstrated this point by capturing a Goggles image of a random old book called "Basic Machines and How They Work."
6.5 seconds later three results came back. The first was the book result. The second was some more information about the book. The third was the interesting part: From the picture on the cover of the book, the Goggles infrastructure had figured out to put a link to “manual transmission linkage.” The whole audience swooned and clapped.
"A picture is worth a thousand words. How do we pick the best three?"
Here’s how it works. You take the picture. You stare in wonder as a laser beam scans the image, distracting you while you wait the 6.5 seconds for the Google cloud to chew on your image.
During those 6.5 seconds, the image is sent to a Google front door, which passes it off to the Goggles root, which in turn sprays the image in parallel to many different, discretely housed "recognition disciplines." These are visual search engines that specialize in narrow fields such as barcodes, landmarks, DVDs, wine labels, text, logos, and so on. Petrous' slide showed about 20 of these but it’s not clear whether the diagram was representative or for illustration purposes only.
All these discrete entities then vote on what they think the image is, and the Goggles root, electoral-college style, tallies the votes in some esoteric fashion and returns the results to the user 6.5 seconds later.
So what's it good and bad at? "Given a new photo, we can recognize the image 57 percent of the time," Petrous said. Google has bagged and tagged a database of 1 billion recognizable images at this point. It nails most corporate logos, notably Coca Cola. It does less well with minimalist icons like the Nike swoosh. Where it does really badly? Black cats. No kidding. In fact, it is easier for Google Goggles to recognize a specific face than to identify a black cat.
Not That Google Goggles Does Face Recognition
Google Goggles does not do face recognition. Have I mentioned that? Petrou mentioned it no fewer than four times (specifically name-checking any journalists in the audience). But he also made sure to mix his message by mentioning that Google can do face recognition. And pretty well, too!
"The more labeled samples you have—say pictures on social networks—the better we can do," Petrous said. For all his protestations that Goggles wouldn’t use facial recognition, he sure could not help himself from bragging about how awesome Goggles could hypothetically do at picking your face out of a crowd. "There’s a sweet spot, around 17 images, when this technology, given a new picture of you, will rank you in the top ten results 50 percent of the time.
When you feed it 50 pictures (not difficult given the horrifying new Facebook suggestion to tag random images of people you recognize) you will appear in the top 5 results half the time.
"We do it well but it’s not deployed." Is that a threat or a promise?
Ominously, Petrous blew right past a slide titled "Must Be Deployed Responsibly." I guess he thought Hot Chips wasn’t the audience for that kind of soft-focus Lifetime Television for Women hand-wringing.
I heard a lot of muttering at lunch after the session from engineers referring to Google as Big Brother. Several people independently brought up the Wifi sniffing fiasco.
Implications? That’s not an MP, that’s a YP (your problem)
Opening up the talk, Petrou said "society may be ready for this technology, or it might not."
In his book Halting State, British sci-fi writer Charles Stross laid out what will likely be the first implementation of Augmented Reality.
In the book, law enforcement officials are issued standard AR glasses, which can be tweaked to provide a transparent overlay the way you can turn on and off layers in Google maps. Except, what they see is not just maps and landmarks, but the dossier and criminal history of every person who crosses their path.
What would you need to make this sci-fi a reality? 1) A Google Goggles-type back-end that incorporates face recognition; 2) some jaunty AR specs; and 3) access to the databases that contain the public records and personal information shady aggregator web now sites offer up for $49.95.
Now consider the plain (unaugmented) reality:
1) Petrous tells us that already exists.
2) Augmented reality glasses have just gotten much better.
3) Right now, the query latency is determined in part by network delays (6.5 seconds comes from 3G, where Wfi offers 1.2 seconds): The coming 4G network that MindSpeed described at Hot Chips will make the data stream much faster.
"I don't care, I’m not doing anything wrong," a commenter posted on my recent rant about social networking and the surveillance state. "No one wants to find me." Sure they do, Gerry! If someone can break into a database, they will be well-served by a centralized repository of all your pertinent information.
What law enforcement (or Google) aggregateth, the hacker taketh away.