Clever Compression of Some Neural Nets Improves Performance

MIT researchers find an efficient way to prune speech-recognition AIs while still boosting accuracy

3 min read
Icon based illustration showing an AI language processing cycle.
iStockPhoto

As neural networks grow larger, they become more powerful, but also more power-hungry, gobbling electricity, time, and computer memory. Researchers have explored ways to lighten the load, especially for deployment on mobile devices. One compression method is called pruning—deleting the weakest links. New research proposes a novel way to prune speech-recognition models, making the pruning process more efficient while also rendering the compressed model more accurate.

The researchers addressed speech recognition for relatively uncommon languages. To learn speech recognition using only supervised learning, software requires a lot of existing audio-text pairings, which are in short supply for some languages. A popular method called self-supervised learning gets around the problem. In self-supervised learning, a model finds patterns in data without any labels—such as “dog” on a dog image. Artificial intelligence can then build on these patterns and learn more focused tasks using supervised learning on minimal data, a process called fine-tuning.

In a speech recognition application, a model might intake hours of unlabeled audio recordings, silence short sections, and learn to fill in the blanks. Somehow it builds internal representations of the data that it can take in different directions. Then in fine-tuning it might learn to transcribe a given language using only minutes of transcribed audio. For each snippet of sound, it would guess the word or words, and update its connections based on whether it’s right or wrong.

The authors of the new work explored a few ways to prune fine-tuned speech-recognition models. One way is called OMP (One-shot Magnitude Pruning), which other researchers had developed for image-processing models. They took a pre-trained speech-recognition model (one that had completed the step of self-supervised learning) and fine-tuned it on a small amount of transcribed audio. Then they pruned it. Then they fine-tuned it again.

The team applied OMP to several languages and found that the pruned models were structurally very similar across languages. These results surprised them. “So, this is not too obvious,” says Cheng-I Jeff Lai, a doctoral student at MIT and the lead author of the new work. “This motivated our pruning algorithm.” They hypothesized that, given the similarity in structure between the pruned models, pre-trained models probably didn’t need much fine-tuning. That’s good, because fine-tuning is a computationally intense process. Lai and his collaborators developed a new method, called PARP (Prune, Adjust and Re-Prune), that requires only one round of fine-tuning. They’ll present their paper this month, at the NeurIPS (Neural Information Processing Systems) AI conference. The group’s research, Lai says, is part of an ongoing collaboration on low-resource language learning between MIT CSAIL and MIT-IBM Watson AI Lab.

PARP starts, Lai says, with a pre-trained speech-recognition model, then prunes out the weakest links, but instead of deleting them completely, it just temporarily sets their strengths to zero. It then fine-tunes the model using labeled data, allowing the zeros to grow back if they’re truly important. Finally PARP prunes the model once again. Whereas OMP fine-tunes, prunes, and fine-tunes, PARP prunes, fine-tunes, and prunes. Pruning twice is computationally trivial comparing to fine-tuning twice.

At realistic pruning levels, PARP achieved error rates similar to OMP while using half as many fine-tunings. Another interesting finding: In some setups where PARP pruned between 10% and 60% of a network, it actually improved ASR accuracy over an unpruned model, perhaps by eliminating noise from the network. OMP created no such boost. “This is one thing that impresses me,” says Hung-yi Lee, a computer scientist at National Taiwan University who was not involved in the work.

Lai says PARP or something like it could lead to ASR models that, compared with current models, are faster and more accurate, while requiring less memory and less training. He calls for more research into practical applications. (One research direction applies pruning to speech synthesis models. He’s submitted a paper on the topic to next year’s ICASSP conference.) “A second message,” he says, given some of the surprising findings, “is that pruning can be a scientific tool for us to understand these speech models deeper.”

The Conversation (0)

The Transistor at 75

The past, present, and future of the modern world’s most important invention

1 min read
A photo of a birthday cake with 75 written on it.
Lisa Sheehan
LightGreen

Seventy-five years is a long time. It’s so long that most of us don’t remember a time before the transistor, and long enough for many engineers to have devoted entire careers to its use and development. In honor of this most important of technological achievements, this issue’s package of articles explores the transistor’s historical journey and potential future.

Keep Reading ↓Show less