This is part of IEEE Spectrum's special R&D report: They Might Be Giants: Seeds of a Tech Turnaround.
Max Huang says he has something cool to show me. I'm skeptical: he's holding in his hand what looks like a PDA. It is a PDA, a Compaq 3600, to be exact, unadorned and, to my eyes, unremarkable. What's special is what's inside: this PDA understands what you say.
Huang and his colleagues at the Philips Speech Processing office in Taipei, Taiwan, have streamlined the company's standard speech recognition engine, meant for servers and PCs, to run instead on a PDA. It's just a prototype, Huang says, but the Mandarin-language recognizer can distinguish about 40 000 words and still not tax the Compaq's memory, power, or processing. With it, Huang can access his address book, schedule appointments, and dictate e-mail. Considering the alternative--poking away at the device's tiny display with a skinny stylus--I'm starting to be convinced: this does seem pretty cool.
To the extent that the average person is familiar with speech recognition, she probably thinks of dictating reports to a PC, or maybe dialing an automated call center for flight or train schedules. Indeed, the speech industry has been pushing those kinds of applications over the last decade.
But some of the most novel and most challenging work being done now involves putting speech recognition where it was previously thought infeasible: into toys and MP3 players, car navigation and entertainment systems, and cellphones and PDAs. What's enabling the migration of speech to smaller devices is, on the one hand, efficient speech recognition engines that can handle noise and variations in speech, and, on the other, faster, bigger, and cheaper processors and memory chips on which the engines can run.
The push for embedded speech comes at a time when manufacturers are trying to cram ever more functions into ever smaller devices. "There's just not enough room for all the buttons and displays," says Erik Soule, director of marketing for Sensory Inc. (Santa Clara, Calif.), a developer of embedded speech products. A voice interface that lets you say the name of that Beatles song you want to listen to, rather than delving through your iPod's multiple menus, offers a less frustrating alternative. "We look at voice as a great complement to the visual and touch user interfaces," Soule says.
Will consumers buy it? The Kelsey Group (Princeton, N.J.), one of the few analyst firms that track embedded speech, thinks so. In a white paper issued in July, Kelsey projected that software licenses from embedded speech will grow from US $8 million this year to $277 million in 2006, making it one of the fastest-growing segments of the speech market. That said, speech is not a business where good products translate into easy profits: witness the 1999 collapse of Lernout and Hauspie, until then an industry leader and holder of some of the best technology around.
Still, a wide range of little and big companies are now getting into the embedded speech market. This includes established players, like IBM and Philips [ranked (5) and (24), respectively, among the Top 100 R and D Spenders, which both have higher-end speech recognition products and decades of research experience. It also includes smaller firms like Sensory, Advanced Recognition Technology, and Voice Signal Technologies, which focus on embedded technology [see chart, Who's Getting Into Embedded Speech].
They're betting on a wide range of applications. A few, like voice dialing, have already entered the mainstream, while others, like voice-activated light switches and TV sets, remain a novelty, and still others, like composing e-mail on your cellphone and retrieving directions while driving, lie farther out on the technological horizon.