Clever Compression of Some Neural Nets Improves Performance

As neural networks grow larger, they become more powerful, but also more power-hungry, gobbling electricity, time, and computer memory. Researchers have explored ways to lighten the load, especially for deployment on mobile devices. One compression method is called pruning—deleting the weakest links. New research proposes a novel way to prune speech-recognition models, making the pruning process more efficient while also rendering the compressed model more accurate.

The researchers addressed speech recognition for relatively uncommon languages. To learn speech recognition using only supervised learning, software requires a lot of existing audio-text pairings, which are in short supply for some languages. A popular method called self-supervised learning gets around the problem. In self-supervised learning, a model finds patterns in data without any labels—such as “dog” on a dog image. Artificial intelligence can then build on these patterns and learn more focused tasks using supervised learning on minimal data, a process called fine-tuning.

In a speech recognition application, a model might intake hours of unlabeled audio recordings, silence short sections, and learn to fill in the blanks. Somehow it builds internal representations of the data that it can take in different directions. Then in fine-tuning it might learn to transcribe a given language using only minutes of transcribed audio. For each snippet of sound, it would guess the word or words, and update its connections based on whether it’s right or wrong.

The authors of the new work explored a few ways to prune fine-tuned speech-recognition models. One way is called OMP (One-shot Magnitude Pruning), which other researchers had developed for image-processing models. They took a pre-trained speech-recognition model (one that had completed the step of self-supervised learning) and fine-tuned it on a small amount of transcribed audio. Then they pruned it. Then they fine-tuned it again.

The team applied OMP to several languages and found that the pruned models were structurally very similar across languages. These results surprised them. “So, this is not too obvious,” says Cheng-I Jeff Lai, a doctoral student at MIT and the lead author of the new work. “This motivated our pruning algorithm.” They hypothesized that, given the similarity in structure between the pruned models, pre-trained models probably didn’t need much fine-tuning. That’s good, because fine-tuning is a computationally intense process. Lai and his collaborators developed a new method, called PARP (Prune, Adjust and Re-Prune), that requires only one round of fine-tuning. They’ll present their paper this month, at the NeurIPS (Neural Information Processing Systems) AI conference. The group’s research, Lai says, is part of an ongoing collaboration on low-resource language learning between MIT CSAIL and MIT-IBM Watson AI Lab.

PARP starts, Lai says, with a pre-trained speech-recognition model, then prunes out the weakest links, but instead of deleting them completely, it just temporarily sets their strengths to zero. It then fine-tunes the model using labeled data, allowing the zeros to grow back if they’re truly important. Finally PARP prunes the model once again. Whereas OMP fine-tunes, prunes, and fine-tunes, PARP prunes, fine-tunes, and prunes. Pruning twice is computationally trivial comparing to fine-tuning twice.

At realistic pruning levels, PARP achieved error rates similar to OMP while using half as many fine-tunings. Another interesting finding: In some setups where PARP pruned between 10% and 60% of a network, it actually improved ASR accuracy over an unpruned model, perhaps by eliminating noise from the network. OMP created no such boost. “This is one thing that impresses me,” says Hung-yi Lee, a computer scientist at National Taiwan University who was not involved in the work.

Lai says PARP or something like it could lead to ASR models that, compared with current models, are faster and more accurate, while requiring less memory and less training. He calls for more research into practical applications. (One research direction applies pruning to speech synthesis models. He’s submitted a paper on the topic to next year’s ICASSP conference.) “A second message,” he says, given some of the surprising findings, “is that pruning can be a scientific tool for us to understand these speech models deeper.”

speech recognition technology pruning artificial intelligence

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Clever Compression of Some Neural Nets Improves Performance

MIT researchers find an efficient way to prune speech-recognition AIs while still boosting accuracy

Augmented Reality Slims Down With AI and Holograms

Brain-Inspired Computer Approaches Brain-Like Size

Engineering Needs More Futurists

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Clever Compression of Some Neural Nets Improves Performance

MIT researchers find an efficient way to prune speech-recognition AIs while still boosting accuracy

Augmented Reality Slims Down With AI and Holograms

Brain-Inspired Computer Approaches Brain-Like Size

Engineering Needs More Futurists

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training