Language barriers are starting to crumble. This month Japan's dominant mobile phone operator, NTT DoCoMo, introduced the world's first app for real-time voice translation. When a user with a DoCoMo smartphone places a call through the app, he speaks in Japanese and his words are promptly translated into English, Mandarin, or Korean. To complete the conversational circuit, the other person's words are translated from any of those languages back into Japanese.
With this debut we've taken one step closer to building a mechanical Babel fish, the extraordinarily useful creature imagined by Douglas Adams in The Hitchhiker's Guide to the Galaxy. As any lover of sci-fi knows, the Babel fish is a leech-like critter that is inserted into the ear and lives in the brain, where it feeds on brain waves and provides simultaneous translation of any language in the universe. NTT DoCoMo's app can't match that universal utility with its current limit of four languages—but at least you don't have to slip something slimy into your ear to make it work.
AT&T's research lab showed off its own translation service earlier this year, but NTT's is further along and seems better integrated into the phone call itself.
The free DoCoMo app relies on the cloud for the heavy processing, namely speech recognition, machine translation, and voice synthesis. According to a NTT DoCoMo newsletter, the app's reliance on the cloud allows for unobtrusive upgrades and the most important feature, near-instant translation:
Trials have shown that the average processing time takes just about two seconds, fast enough for a reasonably natural conversation under the most unnatural of conditions, i.e., two people conversing easily without understanding each other’s language!
To test the app, the company gave out a beta version that handled Japanese and English to tourist facilities, retail companies, and hospitals. NTT DoCoMo says the trial app had about 90 percent accuracy in recognizing Japanese words, and about 80 percent accuracy in recognizing English words.
The company didn't say how accurate or artful the translations of those words were, though, so I asked for a demonstration. Spokesman So Hiroki graciously complied, and on Tuesday evening my desk phone rang. When I picked it up, a recording told me that this was an automated translation call, and that I should press 0 to continue. Then I heard a man say "Moshi moshi," a gentle chime, and then a soothing woman's voice (not unlike the lady who lives inside many car navigation systems) say "Good evening!"
I quickly discovered that the system is great at pleasantries, not so great at more complicated communications. At one point I asked Hiroki and his colleagues on the call which languages would be added to the system next. The English answer I got back: "It is European edition such as French and German to challenge next."
Still, it was an impressive demonstration, and the team declared their determination (in grammatically correct and understandable English!) to improve translation precision. According to Hiroki, NTT DoCoMo spent two years developing this service because they're looking for ways to fight an alarming trend for the telecom industry: the rapidly declining rate of voice calls.
In another mode, the app can also be used when two people meet face-to-face: They speak their respective languages, and the app provides both voice translation and text on the phone's display screen.
Image: NTT DoCoMo