Just as multiplexing can help a single communication channel carry many signals at the same time, a new study reveals that multiplexing can help neural networks—the AI systems that now often power speech recognition, computer vision, and more—scan dozens of streams of data simultaneously, letting them greatly boost the rate at which they analyze information.
In artificial neural networks, components dubbed “neurons” are fed data and cooperate to solve a problem, such as recognizing images. The neural net repeatedly adjusts the links between its neurons and sees if the resulting patterns of behavior are better at finding a solution. Over time, the network discovers which patterns are best at computing results. It then adopts these as defaults, mimicking the process of learning in the human brain. The features of a neural net that change with learning, such as the nature of the connections between neurons, are known as its parameters.
Recent research suggests that modern neural networks often have vastly more parameters than they need—potentially, they could prune the numbers of their parameters by more than 90 percent to reduce their sizes without harming their accuracy. This raised a question that researchers at Princeton University aimed to address—if neural networks possessed more computing power than they needed, could they each analyze multiple streams of information simultaneously to help learn a task, just as a radio channel can share its bandwidth to carry multiple signals at the same time?
The researchers developed a technique they named DataMUX wherein a neural network can analyze multiple data feeds simultaneously as one mixed clump of information. This can significantly boost its efficiency, letting it analyze substantially more quickly while demanding little in the way of extra time, computation, or memory requirements. They suggest their new method, which they detailed online 18 February on the ArXiv preprint server, may be the first instance of data multiplexing in neural networks.
“We hope this can have a substantial impact on energy consumption and the environmental footprint of machine-learning models, especially for computing services that process a large number of requests at a time,” says study coauthor Vishvak Murahari, a machine-learning researcher at Princeton.
DataMUX works by adding a multiplexing layer and a demultiplexing layer to both ends of a neural network. The signals entering the neural network are each given a specific unique key to help distinguish them all, and the multiplexing layer then merges these multiple inputs together into a single compressed feed. After the neural network processes this input, the demultiplexing layer converts the combined output back into multiple separate results.
The scientists conducted experiments with DataMUX using three different kinds of neural networks—transformers, multilayer perceptrons, and convolutional neural networks. The experiments involved several tasks—image recognition; sentence classification, in which a machine aims to identify whether text is spam, a business article, and so on; named entity recognition, which involves locating and classifying named entities such as people, groups, and places.
Experiments with transformers on text-classification tasks revealed they could multiplex up to 40 inputs, achieving up to an 18-fold speedup in the rate at which they could process these inputs with as little as a 2 percent drop in accuracy.
“We could aim to increase throughput of machine-learning models manyfold,” Murahari says. “This is profound since it opens up applications where one might need to invoke the model dynamically with high frequency. For instance, imagine an assistive writing application where one could invoke state-of-the-art language models very frequently, leading to a more intuitive and a smooth running application.” In addition, “products using large machine-learning models could decrease their compute costs dramatically with DataMUX,” Murahari says. “We could also imagine models running on less specialized hardware, such as CPUs instead of GPUs, since models running DataMUX on CPUs could close the throughput gap with nonmultiplexed models running on GPU. This would enable a large number of machine-learning applications to run on low-resource edge devices.”
This 40-fold increase in the inputs received led to only an 18-fold boost in throughput, instead of the expected 40-fold enhancement, likely because of the way in which the keys associated with each input grew in length the more inputs there were. Future work may potentially improve this speedup through better multiplexing and demultiplexing strategies, the researchers say.
As to how neural networks using DataMUX do not get confused by this mixed feed, “we don't really know,” says study coauthor Carlos Jimenez, a machine learning researcher at Princeton. He notes there is nothing that necessarily causes multiple inputs streaming into a neural network to interfere with each other, “but more work needs to be done to really get to the bottom of this.”
A neural network using DataMUX is not limited to performing just one task on this multiplexed data, such as just recognizing names. It could use this combined input to carry out multiple tasks that it is trained on at the same time, such as both recognizing names and classifying sentences, notes study senior author Karthik Narasimhan, a machine-learning researcher at Princeton.
In the future, the researchers aim to experiment with multiplexing state-of-the-art neural networks such as BERT and GPT-3. They would also like to investigate other multiplexing schemes with which they could scale up to hundreds or even thousands of inputs at once, “leading to even larger improvements in throughput,” Murahari says. “We could really just be at the tip of the iceberg.”