Three years into the Siri era, speech recognition systems remain more advertised than used. Now one of Apple's leading rivals hopes to change that. Microsoft claims that an improved system for its Windows Phones doubles the speed and boosts accuracy by about 15 percent.
The better speech recognition for Bing Voice search aims to give Microsoft's Windows Phone an edge in the battle for smartphone users. Windows Phone owners may end up more excited about the speed boost, but the added accuracy should translate to saved time and improved efficiency.
Still, it may not save enough time or improve efficiency enough. "For a normal sentence, you will have one less word to correct," said Michael Tjalve, a senior program manager in the speech technology group at Microsoft, in a CNET article. CNET went on to note that Microsoft didn't compare its system against rival systems, because, Microsoft said, different companies measure speech recognition system by using different standards.
The breakthrough for researchers at Microsoft Research and Bing came from tapping the power of so-called deep neural networks, which proved capable of detecting the correct speech patterns most of the time despite background noise, according to a Microsoft Research paper. In a blog post, the researchers explained:
Those improvements come, in part, from contributions delivered via Microsoft Research’s work on deep neural networks (DNNs). Such networks are a computational framework for automatic pattern recognition that is inspired by the basic circuits of the human brain. Refinements in mathematical formulas, coupled with greater computational power and large data sets, enable DNNs to learn and edge noticeably closer than traditional speech technologies to humans’ ability to recognize speech and images.
Better speech recognition is not limited to one language for DNNs. Microsoft researchers discovered how DNNs can "learn across languages," meaning that "data from one language can help improve accuracy for another."
Such improvements may not sound like much if smartphone users still have to correct words in each transcribed sentence. But if the improved speech recognition is coupled to spell-correction systems, which also continue to improve year after year, we may soon get close enough that we can just hit "send" after dictating an e-mail to our phones.
To be sure, that's something still best avoided when behind the wheel. New research has shown how such speech recognition systems represent the worst distraction for drivers compared to listening to audio books, tuning in to a favorite radio station, talking with car passengers, talking on a handheld phone or talking hands-free.
Jeremy Hsu has been working as a science and technology journalist in New York City since 2008. He has written on subjects as diverse as supercomputing and wearable electronics for IEEE Spectrum. When he’s not trying to wrap his head around the latest quantum computing news for Spectrum, he also contributes to a variety of publications such as Scientific American, Discover, Popular Science, and others. He is a graduate of New York University’s Science, Health & Environmental Reporting Program.