The December 2022 issue of IEEE Spectrum is here!

Close bar

Microsoft Boosts Speech Recognition for its Smartphones

Deep neural networks will improve speed and accuracy of speech recognition on Windows Phones

2 min read
Microsoft Boosts Speech Recognition for its Smartphones

Three years into the Siri era, speech recognition systems remain more advertised than used. Now one of Apple's leading rivals hopes to change that. Microsoft claims that an improved system for its Windows Phones doubles the speed and boosts accuracy by about 15 percent.

The better speech recognition for Bing Voice search aims to give Microsoft's Windows Phone an edge in the battle for smartphone users. Windows Phone owners may end up more excited about the speed boost, but the added accuracy should translate to saved time and improved efficiency.

Still, it may not save enough time or improve efficiency enough. "For a normal sentence, you will have one less word to correct," said Michael Tjalve, a senior program manager in the speech technology group at Microsoft, in a CNET article. CNET went on to note that Microsoft didn't compare its system against rival systems, because, Microsoft said, different companies measure speech recognition system by using different standards.

The breakthrough for researchers at Microsoft Research and Bing came from tapping the power of so-called deep neural networks, which proved capable of detecting the correct speech patterns most of the time despite background noise, according to a Microsoft Research paper. In a blog post, the researchers explained:

Those improvements come, in part, from contributions delivered via Microsoft Research’s work on deep neural networks (DNNs). Such networks are a computational framework for automatic pattern recognition that is inspired by the basic circuits of the human brain. Refinements in mathematical formulas, coupled with greater computational power and large data sets, enable DNNs to learn and edge noticeably closer than traditional speech technologies to humans’ ability to recognize speech and images.

Better speech recognition is not limited to one language for DNNs. Microsoft researchers discovered how DNNs can "learn across languages," meaning that "data from one language can help improve accuracy for another."

Such improvements may not sound like much if smartphone users still have to correct words in each transcribed sentence. But if the improved speech recognition is coupled to spell-correction systems, which also continue to improve year after year, we may soon get close enough that we can just hit "send" after dictating an e-mail to our phones.

To be sure, that's something still best avoided when behind the wheel. New research has shown how such speech recognition systems represent the worst distraction for drivers compared to listening to audio books, tuning in to a favorite radio station, talking with car passengers, talking on a handheld phone or talking hands-free.

Photo: Microsoft

The Conversation (0)

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
Vertical
A plate of spaghetti made from code
Shira Inbar
DarkBlue1

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less
{"imageShortcodeIds":["31996907"]}