A huge number of machine learning applications could receive a performance upgrade, thanks to a relatively minor modification to their underlying neural networks.
If you are a developer creating a new machine learning application, you typically build on top of a existing neural network architecture, one that is already tuned for the kind of problem you are trying to solve—creating your own architecture from scratch is a difficult job that’s typically more trouble than it’s worth. Even with an existing architecture in hand, reengineering it for better performance is no small task. But one team has come up with new neural network module that can boost AI performance when plugged into four of the most widely used architectures.
Critically, the research funded by the U.S. National Science Foundation and Army Research Office achieves this performance boost through the new module without requiring much of an increase in computing power. It’s part of a broader project by North Carolina State University researchers to rethink the architecture of the neural networks involved in modern AI’s deep learning capabilities.
“At the macro level, we try to redesign the entire neural network as a whole,” says Tianfu Wu, an electrical and computer engineer at North Carolina State University in Raleigh. “Then we try to focus on the specific components of the neural network.”
Wu and his colleagues presented their work (PDF) on the new neural network component, or module, named Attentive Normalization, at the virtual version of the 16th European Conference on Computer Vision in August. They have also released the code so that other researchers can plug the module into their own deep learning models.
In preliminary testing, the group found that the new module improved performance in four mainstream neural network architectures: ResNets, DenseNets, MobileNetsV2 and AOGNets. The researchers checked the upgraded networks’ performances against two industry benchmarks for testing visual object recognition and classification, including ImageNet-1000 and MS-COCO 2017. For example, the new module boosted the top-1 accuracy in the ImageNet-1000 benchmark by between 0.5 percent and 2.7 percent. This may seem small, but it can make a significant difference in practice, not least because of the large scale of many machine learning deployments.
Altogether, the diverse array of architectures are suitable for performing AI-driven tasks on both large computing systems and mobile devices with more limited computing power. But the most noticeable improvement in performance came in the neural network architectures suited for mobile platforms such as smartphones.
The key to the team’s success came from combining two neural network modules that usually operate separately. “In order to make a neural network more powerful or easier to train, feature normalization and feature attention are probably two of the most important components,” Wu says.
The feature normalization module helps to make sure that no single subset of the data used to train a neural network outweighs the other subsets in shaping the deep learning model. By comparing neural network training to driving a car on a dark road, Wu describes feature normalization as the car’s suspension system smoothing out the jolts from any bumps in the road.
By comparison, the feature attention module helps to focus on certain features in the training data that could better achieve the learning task at hand. Going back to the car analogy for training neural networks, the feature attention module represents the vehicle headlights showing what to look out for on the dark road ahead.
After scrutinizing both modules, the researchers realized that certain sub-processes in both modules overlap in the shared goal of re-calibrating certain features in the training data. That provided a natural integration point for combining feature normalization and feature attention in the new module. “We want to see different micro components in neural architecture that can be and should be integrated together to make them more effective,” Wu says.
Wu and his colleagues also designed the new module so that it could perform the re-calibration task in a more dynamic and adaptive way than the standard modules. That may offer benefits when it comes to transfer learning—taking AI trained on one set of data to perform a given task and applying it to new data for a related task (for example, in a face recognition application, developers typically start with a network that’s good at identifying what objects in a camera’s view are faces, and then train it to recognize specific people).
The new module represents just one small part of the North Carolina State group’s vision for redesigning modern AI. For example, the researchers are trying to develop interpretable AI systems that allow humans to better understand the logic of AI decisions—a not insignificant problem for deep learning models based on neural networks. As one possible step toward that goal, Wu and his colleagues previously developed a framework for building deep neural networks based on a compositional grammar system.
Meanwhile, Wu still sees many other opportunities for fine-tuning smaller parts of neural networks without requiring a complete overhaul of the main architecture.
“There are so many other components in deep neural networks,” Wu says. “We probably also can take a similar angle and try to look at whether there are natural integration points to put them together, or try to redesign them in better a form.”
Jeremy Hsu has been working as a science and technology journalist in New York City since 2008. He has written on subjects as diverse as supercomputing and wearable electronics for IEEE Spectrum. When he’s not trying to wrap his head around the latest quantum computing news for Spectrum, he also contributes to a variety of publications such as Scientific American, Discover, Popular Science, and others. He is a graduate of New York University’s Science, Health & Environmental Reporting Program.