Baidu’s AI Can Do Simultaneous Translation Between Any Two Languages

Baidu Research reveals a translation tool that keeps up by predicting the future

4 min read
Photo-illustration of on the fly translation.
Photo-illustration: Shutterstock

Would-be travelers of the galaxy, rejoice: The Chinese tech giant Baidu has invented a translation system that brings us one step closer to a software Babel fish.

For those unfamiliar with the Douglas Adams masterworks of science fiction, let me explain. The Babel fish is a slithery fictional creature that takes up residence in the ear canal of humans, tapping into their neural systems to provide instant translation of any language they hear.

In the real world, until now, we’ve had to make do with human and software interpreters that do their best to keep up. But the new AI-powered tool from Baidu Research, called STACL, could speed things up considerably. It uses a sophisticated type of natural language processing that lags only a few words behind, and keeps up by predicting the future.

“What’s remarkable is that it predicts and anticipates the words a speaker is about to say a few seconds in the future,” says Liang Huang, principal scientist of Baidu’s Silicon Valley AI Lab. “That’s a technique that human interpreters use all the time—and it’s critical for real-world applications of interpretation technology.” 

The STACL (Simultaneous Translation with Anticipation and Controllable Latency) tool is comparable to the human interpreters who sit in booths during UN meetings. These humans have a tough job. As a dignitary speaks, the interpreters must simultaneously listen, mentally translate, and speak in another language, usually lagging only a few words behind. It’s such a difficult task that UN interpreters usually work in teams and take shifts of only 10 to 30 minutes.

A task requiring that kind of parallel processing—listening, translating, speaking—seems well suited for computers. But until now, it was too hard for them, too. The best “real-time” translating systems still do what’s called consecutive translation, in which they wait for each sentence to conclude before rendering its equivalent in another language. These systems provide quite accurate translations, but they’re slow. 

Huang tells IEEE Spectrum that the big challenge in simultaneous interpretation comes from word order differences in various languages. “In the UN, there’s a famous joke that an interpreter who’s translating from German to English will pause, and seem to get stuck,” he says. “If you ask why, they say, ‘I’m waiting for the German verb.’” In English, the verb comes early in the sentence, he explains, while in German it comes at the very end of the sentence.

STACL gets around that problem by predicting the verb to come, based on all the sentences it has seen in the past. For their current paper, the Baidu researchers trained STACL on newswire articles, where the same story appeared in multiple languages. As a result, it’s good at making predictions about sentences dealing with international politics.

Huang gives an example of a Chinese sentence, which would be most directly translated as “Xi Jinping French president visit expresses appreciation.” STACL, however, would guess from the beginning of the sentence that the visit would go well, and translates it into English as “Xi Jinping expresses appreciation for the French president’s visit.”

[shortcode ieee-pullquote quote=""A human interpreter would apologize, but our current system doesn't have the capability to revise an error"" float="right" expand=1]

For their current paper, the researchers demonstrated its capabilities in translating from Chinese to English (two languages with big differences in word order). “In principle, it can work on any language pair,” Huang says. “There’s data on all those other languages. We just haven’t run those experiments yet.” 

Clearly, STACL can make mistakes. If the French president’s visit hadn’t gone well, and Xi Jinping instead expressed regret and dismay, the translation would have a glaring error. At the moment, it can’t correct its mistakes. “A human interpreter would apologize, but our current system doesn’t have the capability to revise an error,” Huang says.

However, the system is adjustable, and users will be able to make trade-offs between speed and accuracy. If STACL is programmed to have longer latency—to lag five words behind the original text instead of three words behind—it’s more likely to get things right.

It can also be made more accurate by training it in a particular subject, so that it understands the likely sentences that will appear in presentations at, say, a medical conference. “Just like a human simultaneous interpreter, it would need to do some homework before the event,” Huang says.

Huang says STACL will be demoed at a Baidu World conference on 1 November, where it will provide live simultaneous translation of the speeches. The aim is to eventually put this capability into consumers’ pockets. Baidu has previously shown off a prototype consumer device that does sentence-by-sentence translation, and Huang says his team plans to integrate STACL into that gadget. 

Right now, STACL works on text-to-text translation and speech-to-text translation. To make it useful for a consumer device, the researchers want to master speech-to-speech translation. That will require integrating speech synthesis into the system. And when the speech is being synthesized only a few words at a time, without any knowledge of the whole sentence’s structure, it will be a challenge to make it sound natural. 

Huang says his goal is to make instant translation services more readily accessible and affordable to the general public. But he notes that STACL is “not intended to replace human interpreters—especially for high-stakes situations that require precise and consistent translations.” After all, nobody wants an AI to be at the center of an international incident because it erroneously predicts Xi Jinping’s expressions of appreciation or regret.  

The Conversation (0)

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less