The December 2022 issue of IEEE Spectrum is here!

Close bar
We Can Now Train Big Neural Networks on Small Devices

This artist’s rendering illustrates how small gadgets might soon do onboard processing of the data with which they learn about the world.

iStockphoto/IEEE Spectrum

The gadgets around us are constantly learning about our lives. Smartwatches pick up on our vital signs to track our health. Home speakers listen to our conversations to recognize our voices. Smartphones play grammarian, watching what we write in order to fix our idiosyncratic typos. We appreciate these conveniences, but the information we share with our gadgets isn’t always kept between us and our electronic minders. Machine learning can require heavy hardware, so “edge” devices like phones often send raw data to central servers, which then return trained algorithms. Some people would like that training to happen locally. A new AI training method expands the training capabilities of smaller devices, potentially helping to preserve privacy.

The most powerful machine-learning systems use neural networks, complex functions filled with tunable parameters. During training, a network receives an input (such as a set of pixels), generates an output (such as the label “cat”), compares its output with the correct answer, and adjusts its parameters to do better next time. To know how to tune each of those internal knobs, the network needs to remember the effect of each one, but they regularly number in the millions or even billions. That requires a lot of memory. Training a neural network can require hundreds of times the memory called upon when merely using one (also called “inference”). In the latter case, the memory is allowed to forget what each layer of the network did as soon as it passes information to the next layer.

To reduce the memory demanded during the training phase, researchers have employed a few tricks. In one, called paging or offloading, the machine moves those activations from short-term memory to a slower but more abundant type of memory such as flash or an SD card, then brings it back when needed. In another, called rematerialization, the machine deletes the activations, then computes them again later. Previously, memory-reduction systems used one of those two tricks or, says Shishir Patil, a computer scientist at the University of California, Berkeley, and the lead author of the paper describing the innovation, they were combined using “heuristics” that are “suboptimal,” often requiring a lot of energy. The innovation reported by Patil and his collaborators formalizes the combination of paging and rematerialization.

“Taking these two techniques, combining them well into this optimization problem, and then solving it—that’s really nice,” says Jiasi Chen, a computer scientist at the University of California, Riverside, who works on edge computing but was not involved in the work.

In July, Patil presented his system, called POET (private optimal energy training), at the International Conference on Machine Learning, in Baltimore. He first gives POET a device’s technical details and information about the architecture of a neural network he wants it to train. He specifies a memory budget and a time budget. He then asks it to create a training process that minimizes energy usage. The process might decide to page certain activations that would be inefficient to recompute but rematerialize others that are simple to redo but require a lot of memory to store.

One of the keys to the breakthrough was to define the problem as a mixed integer linear programming (MILP) puzzle, a set of constraints and relationships between variables. For each device and network architecture, POET plugs its variables into Patil’s hand-crafted MILP program, then finds the optimal solution. “A main challenge is actually formulating that problem in a nice way so that you can input it into a solver,” Chen says. “So, you capture all of the realistic system dynamics, like energy, latency, and memory.”

The team tested POET on four different processors, whose RAM ranged from 32 KB to 8 GB. On each, the researchers trained three different neural network architectures: two types popular in image recognition (VGG16 and ResNet-18), plus a popular language-processing network (BERT). In many of the tests, the system could reduce memory usage by about 80 percent, without a big bump in energy use. Comparable methods couldn’t do both at the same time. According to Patil, the study showed that BERT can now be trained on the smallest devices, which was previously impossible.

“When we started off, POET was mostly a cute idea,” Patil says. Now, several companies have reached out about using it, and at least one large company has tried it in its smart speaker. One thing they like, Patil says, is that POET doesn’t reduce network precision by “quantizing,” or abbreviating, activations to save memory. So the teams that design networks don’t have to coordinate with teams that implement them in order to negotiate trade-offs between precision and memory.

Patil notes other reasons to use POET besides privacy concerns. Some devices need to train networks locally because they have low or no Internet connection. These include devices used on farms, in submarines, or in space. Other setups can benefit from the innovation because data transmission requires too much energy. POET could also make large devices—Internet servers—more memory efficient and energy efficient. But as for keeping data private, Patil says, “I guess this is very timely, right?”

The Conversation (1)
smithlogan logan27 Sep, 2022

I appreciate it's an astonishing article and that All academic students can benefit greatly from doing their assignments and that do your online course they must offer essential services in all slots.

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less