The July 2022 issue of IEEE Spectrum is here!

Close bar

Clever Compression of Some Neural Nets Improves Performance

MIT researchers find an efficient way to prune speech-recognition AIs while still boosting accuracy

3 min read
Icon based illustration showing an AI language processing cycle.

As neural networks grow larger, they become more powerful, but also more power-hungry, gobbling electricity, time, and computer memory. Researchers have explored ways to lighten the load, especially for deployment on mobile devices. One compression method is called pruning—deleting the weakest links. New research proposes a novel way to prune speech-recognition models, making the pruning process more efficient while also rendering the compressed model more accurate.

The researchers addressed speech recognition for relatively uncommon languages. To learn speech recognition using only supervised learning, software requires a lot of existing audio-text pairings, which are in short supply for some languages. A popular method called self-supervised learning gets around the problem. In self-supervised learning, a model finds patterns in data without any labels—such as “dog” on a dog image. Artificial intelligence can then build on these patterns and learn more focused tasks using supervised learning on minimal data, a process called fine-tuning.

In a speech recognition application, a model might intake hours of unlabeled audio recordings, silence short sections, and learn to fill in the blanks. Somehow it builds internal representations of the data that it can take in different directions. Then in fine-tuning it might learn to transcribe a given language using only minutes of transcribed audio. For each snippet of sound, it would guess the word or words, and update its connections based on whether it’s right or wrong.

The authors of the new work explored a few ways to prune fine-tuned speech-recognition models. One way is called OMP (One-shot Magnitude Pruning), which other researchers had developed for image-processing models. They took a pre-trained speech-recognition model (one that had completed the step of self-supervised learning) and fine-tuned it on a small amount of transcribed audio. Then they pruned it. Then they fine-tuned it again.

The team applied OMP to several languages and found that the pruned models were structurally very similar across languages. These results surprised them. “So, this is not too obvious,” says Cheng-I Jeff Lai, a doctoral student at MIT and the lead author of the new work. “This motivated our pruning algorithm.” They hypothesized that, given the similarity in structure between the pruned models, pre-trained models probably didn’t need much fine-tuning. That’s good, because fine-tuning is a computationally intense process. Lai and his collaborators developed a new method, called PARP (Prune, Adjust and Re-Prune), that requires only one round of fine-tuning. They’ll present their paper this month, at the NeurIPS (Neural Information Processing Systems) AI conference. The group’s research, Lai says, is part of an ongoing collaboration on low-resource language learning between MIT CSAIL and MIT-IBM Watson AI Lab.

PARP starts, Lai says, with a pre-trained speech-recognition model, then prunes out the weakest links, but instead of deleting them completely, it just temporarily sets their strengths to zero. It then fine-tunes the model using labeled data, allowing the zeros to grow back if they’re truly important. Finally PARP prunes the model once again. Whereas OMP fine-tunes, prunes, and fine-tunes, PARP prunes, fine-tunes, and prunes. Pruning twice is computationally trivial comparing to fine-tuning twice.

At realistic pruning levels, PARP achieved error rates similar to OMP while using half as many fine-tunings. Another interesting finding: In some setups where PARP pruned between 10% and 60% of a network, it actually improved ASR accuracy over an unpruned model, perhaps by eliminating noise from the network. OMP created no such boost. “This is one thing that impresses me,” says Hung-yi Lee, a computer scientist at National Taiwan University who was not involved in the work.

Lai says PARP or something like it could lead to ASR models that, compared with current models, are faster and more accurate, while requiring less memory and less training. He calls for more research into practical applications. (One research direction applies pruning to speech synthesis models. He’s submitted a paper on the topic to next year’s ICASSP conference.) “A second message,” he says, given some of the surprising findings, “is that pruning can be a scientific tool for us to understand these speech models deeper.”

The Conversation (0)
Twenty people crowd into a cubicle, the man in the center seated holding a silicon wafer full of chips

Intel's million-transistor chip development team

In San Francisco on Feb. 27, 1989, Intel Corp., Santa Clara, Calif., startled the world of high technology by presenting the first ever 1-million-transistor microprocessor, which was also the company’s first such chip to use a reduced instruction set.

The number of transistors alone marks a huge leap upward: Intel’s previous microprocessor, the 80386, has only 275,000 of them. But this long-deferred move into the booming market in reduced-instruction-set computing (RISC) was more of a shock, in part because it broke with Intel’s tradition of compatibility with earlier processors—and not least because after three well-guarded years in development the chip came as a complete surprise. Now designated the i860, it entered development in 1986 about the same time as the 80486, the yet-to-be-introduced successor to Intel’s highly regarded 80286 and 80386. The two chips have about the same area and use the same 1-micrometer CMOS technology then under development at the company’s systems production and manufacturing plant in Hillsboro, Ore. But with the i860, then code-named the N10, the company planned a revolution.

Keep Reading ↓Show less