Language barriers are starting to crumble. This month Japan's dominant mobile phone operator, NTT DoCoMo, introduced the world's first app for real-time voice translation. When a user with a DoCoMo smartphone places a call through the app, he speaks in Japanese and his words are promptly translated into English, Mandarin, or Korean. To complete the conversational circuit, the other person's words are translated from any of those languages back into Japanese.
With this debut we've taken one step closer to building a mechanical Babel fish, the extraordinarily useful creature imagined by Douglas Adams in The Hitchhiker's Guide to the Galaxy. As any lover of sci-fi knows, the Babel fish is a leech-like critter that is inserted into the ear and lives in the brain, where it feeds on brain waves and provides simultaneous translation of any language in the universe. NTT DoCoMo's app can't match that universal utility with its current limit of four languages—but at least you don't have to slip something slimy into your ear to make it work.
AT&T's research lab showed off its own translation service earlier this year, but NTT's is further along and seems better integrated into the phone call itself.
The free DoCoMo app relies on the cloud for the heavy processing, namely speech recognition, machine translation, and voice synthesis. According to a NTT DoCoMo newsletter, the app's reliance on the cloud allows for unobtrusive upgrades and the most important feature, near-instant translation:
Trials have shown that the average processing time takes just about two seconds, fast enough for a reasonably natural conversation under the most unnatural of conditions, i.e., two people conversing easily without understanding each other’s language!
To test the app, the company gave out a beta version that handled Japanese and English to tourist facilities, retail companies, and hospitals. NTT DoCoMo says the trial app had about 90 percent accuracy in recognizing Japanese words, and about 80 percent accuracy in recognizing English words.
The company didn't say how accurate or artful the translations of those words were, though, so I asked for a demonstration. Spokesman So Hiroki graciously complied, and on Tuesday evening my desk phone rang. When I picked it up, a recording told me that this was an automated translation call, and that I should press 0 to continue. Then I heard a man say "Moshi moshi," a gentle chime, and then a soothing woman's voice (not unlike the lady who lives inside many car navigation systems) say "Good evening!"
I quickly discovered that the system is great at pleasantries, not so great at more complicated communications. At one point I asked Hiroki and his colleagues on the call which languages would be added to the system next. The English answer I got back: "It is European edition such as French and German to challenge next."
Still, it was an impressive demonstration, and the team declared their determination (in grammatically correct and understandable English!) to improve translation precision. According to Hiroki, NTT DoCoMo spent two years developing this service because they're looking for ways to fight an alarming trend for the telecom industry: the rapidly declining rate of voice calls.
In another mode, the app can also be used when two people meet face-to-face: They speak their respective languages, and the app provides both voice translation and text on the phone's display screen.
Image: NTT DoCoMo
Senior Editor Eliza Strickland joined IEEE Spectrum in March 2011 and was initially assigned the Asia beat. She got down to business several days later when the Fukushima Daiichi nuclear disaster began. Strickland shared a Neal Award for news coverage of that catastrophe and wrote the definitive account of the accident's first 24 hours. She next moved to the biomedical engineering beat and managed Spectrum's 2015 special report, “Hacking the Human OS." That report spawned the Human OS blog about emerging technologies that are enabling a more precise and personalized kind of medicine. The blog reports on wearable sensors, big-data analytics, and neural implants that may turn us all into cyborgs. Over the years, Strickland watched as artificial intelligence (AI) technology made inroads into the biomedical space, reporting on crossovers between AI and neuroscience research and IBM Watson's ill-fated efforts in AI health care. These days she oversees Spectrum's coverage of all things AI. Strickland has reported on science and technology for nearly 20 years, writing for such publications as Discover,Nautilus, Sierra, Foreign Policy, and Wired. She holds a master's degree in journalism from Columbia University.