Google Translate has become a quick-and-dirty translation solution for millions of people worldwide since it debuted a decade ago. But Google’s engineers have been quietly tweaking their machine translation service’s algorithms behind the scenes. They recently delivered a huge Google Translate upgrade that harnesses the popular artificial intelligence technique known as deep learning.
Machine translation services such as Google Translate have mostly used a “phrase-based” approach of breaking down sentences into words and phrases to be independently translated. But several years ago, Google began experimenting with a deep-learning technique, called neural machine translation, that can translate entire sentences without breaking them down into smaller components. That approach eventually reduced the number of Google Translate errors by at least 60 percent on many language pairs in comparison with the older, phrase-based approach.
“We believe we are the first using [neural machine translation] in a large-scale production environment,” says Mike Schuster, research scientist at Google.
Many major tech companies have heavily invested in neural machine translation from a research standpoint, says Kyunghyun Cho, a deep-learning researcher at New York University with a focus on natural language processing. But he confirmed that Google seems to be the first to publicly announce its use of neural machine translation in a translation product.
Google Translate has already begun using neural machine translation for its 18 million daily translations between English and Chinese. In a blog post, Google researchers also promised to roll out the improved translations to many more language pairs in the coming months.
The deep-learning approach of Google’s neural machine translation relies on a type of software algorithm known as a recurrent neural network. The neural network consists of nodes, also called artificial neurons, arranged in a stack of layers consisting of 1,024 nodes per layer.
A network of eight layers acts as the “encoder,” which takes the sentence targeted for translation—let’s say from Chinese to English—and transforms it into a list of “vectors.” Each vector in the list represents the meanings of all the words read so far in the sentence, so that a vector farther along the list will include more word meanings.
Once the Chinese sentence has been read by the encoder, a network of eight layers acting as the “decoder” generates the English translation one word at a time in a series of steps. A separate “attention network” connects the encoder and decoder by directing the decoder to pay special attention to certain vectors (encoded words) when coming up with the translation. It’s not unlike a human translator constantly referring back to the original sentence during a translation.
This represents an improved version of the original encoder-decoder method that would compress the starting sentence into a fixed-size vector, regardless of the original sentence’s length. The improved version was presented in a paper that includes Cho as coauthor. Cho, who is not affiliated with Google, explains the less accurate original encoder-decoder method as follows:
If I made an analogy to a human translator, what this means is that the human translator is going to look at a source sentence once, memorize the whole thing and start writing down its translation without ever looking back at the source sentence. This is both unrealistic and extremely inefficient. Why wouldn't a translator look back at the source sentence over and over?
Google started working on neural machine translation several years ago, but the method still generally proved less accurate and required more computational resources than the old approach of phrase-based machine translation. Better accuracy often came at the expense of speed, which is problematic for Google Translate users, who expect almost instantaneous translations.
Google researchers had to harness several clever work-around solutions for their deep-learning algorithms to get beyond the existing limitations of neural machine translation. For example, the team connected the attention network to the encoder and decoder networks in a way that sacrificed some accuracy but allowed for faster speed through parallelism—the method of using several processors to run certain parts of the deep-learning algorithm simultaneously.
“We believe some of our architectural choices are quite unique, mostly to allow maximum parallelism during computation while achieving good accuracy,” Schuster explains.
Another innovation helped neural machine translation handle certain rare words. Part of Google’s solution to this came from the previous work of Schuster and his colleagues on improving the Google Japanese and Korean speech recognition systems. They figured out how to break down rare words into a limited set of smaller, common subunits called “wordpieces,” which the neural machine translation could handle more easily.
A third innovation came from using “quantized computation” to reduce the precision of the system’s calculations and therefore speed up the translation process. Google’s team trained their system to tolerate the resulting “quantization errors” that could arise as a result. “Quantized computation is generally faster than nonquantized computation because all normally 32-bit or 64-bit data can be compressed into 8 or 16 bits, which reduces the time accessing that data and generally makes it faster to do any computations on it,” Schuster says.
Google’s neural machine translation also benefits from running on better hardware than traditional CPUs. The tech giant is using a specialized chip designed for deep learning called the Tensor Processing Unit (TPU). The TPUs alone helped speed up translation by 3.5 times over ordinary chips.
When combined with the new algorithm solutions, Google made its neural machine translation more than 30 times faster with almost no loss of translation accuracy. That huge speed boost made the difference in Google’s decision to finally begin using the deep-learning algorithms for Google Translate in Chinese-to-English translations. The results seem impressive enough to outside experts such as Cho.
“I am extremely impressed by their effort and success in making the inference of neural machine translation fast enough for their production system by quantized inference and their TPU,” Cho says.
Google Translate and other machine translation services still have room for improvement. For example, even the upgraded Google Translate still messes up rare words or simply leaves out certain parts of sentences without translating them. It also still has problems using context to improve its translations. But Schuster seems optimistic that machine translation services will continue to make future progress and creep ever closer to human capabilities.
“If you look at the history of machine translation, you see a constant uptick of translation quality and speed, and we only see this [continuing] until the system is as good as a human in communicating information from one language to another,” Schuster says.
Jeremy Hsu has been working as a science and technology journalist in New York City since 2008. He has written on subjects as diverse as supercomputing and wearable electronics for IEEE Spectrum. When he’s not trying to wrap his head around the latest quantum computing news for Spectrum, he also contributes to a variety of publications such as Scientific American, Discover, Popular Science, and others. He is a graduate of New York University’s Science, Health & Environmental Reporting Program.