Waterwave Could Quench AIs' Thirst for GPU Memory

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

One of the (many) ways in which AI is making waves is in its ability to analyze immense datasets. But training these AI programs is becoming increasingly computationally intensive, underscoring the need to more efficient ways to crunch data.

In a study published 22 May in IEEE Transactions on Computers, researchers describe a novel approach, called Waterwave, to increase the efficiency of training multiple AI models simultaneously and efficiently on the same GPU. Their results show that, in scenarios with high memory demand, Waterwave is 12 times as fast as existing spatial sharing on a GPU and 1.49 times as fast as existing temporal memory sharing.

When an AI model initially needs training, certain calculations and methods are used to find the optimal or sub-optimal models for data analysis. In this way, “good” or “bad” models for analysis are identified as early as possible to significantly accelerate the overall training process.

However, most current methods for training AI models using GPUs unfortunately have to assess models one by one, rather than simultaneously, due to memory constraints. As a result, each training task must be queued one after another, with the possibility that the desired model is at the tail of the queue.

“In the worst scenario, all training tasks need to be finished one by one, which is very time consuming,” explains Xuan Peng, a Ph.D. candidate at Huazhong University of Science and Technology’s School of Computer Science and Technology.

A Divide and Conquer Approach

Peng’s team designed Waterwave so that it breaks models up into more manageable and evenly sized “sub-models.” Multiple sub-models from different models can be processed simultaneously on the same GPU, and as soon as the GPU is finished computing one sub-model, memory space is freed up for the next sub-model in the queue.

“By achieving similar memory sizes, it increases the probability that the freed memory from the preceding sub-model is sufficient for the next sub-model which requires memory allocation. This approach enables the memory freed by one model to be effectively utilized by another model,” says Peng.

Peng and his colleagues tested Waterwave using several popular neural networks used for computer vision and natural language processing applications, and compared it another memory flow approach developed by NVIDIA, called Multi-Process Service (MPS), which also simultaneously evaluates multiple models on a GPU.

The results show that, overall, Waterwave demonstrates excellent memory sharing efficiency when accommodating multiple training jobs, using 76.4 percent to 96.8 percent of GPU memory for each job.

In comparing Waterwave and MPS, the researchers found that MPS outperforms Waterwave by a small margin when the GPU memory has not oversubscribed computing jobs. However, MPS experiences a significant performance degradation (greater than 90 percent) when the GPU memory is oversubscribed, and this level of degradation was not observed to the same extent with Waterwave.

However, Peng notes several limitations with Waterwave. Notably, if one computing job fails, this causes the other computing jobs to fail simultaneously. Also, for models with high GPU compute demand, the performance improvement gained by running tasks in parallel is marginal. “Therefore, our next research objective focuses on optimizing pipeline model parallelism to achieve higher training throughput,” says Peng.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Waterwave Could Quench AIs' Thirst for GPU Memory

The approach breaks up the AI training process into manageable "sub-models"

A Divide and Conquer Approach

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Waterwave Could Quench AIs' Thirst for GPU Memory

The approach breaks up the AI training process into manageable "sub-models"

A Divide and Conquer Approach

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

AI Chip Trims Energy Budget Back by 99+ Percent

Faster, More Secure Photonic Chip Boosts AI Training