Protecting AI Models from “Data Poisoning”

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Training data sets for deep-learning models involves billions of data samples, curated by crawling the Internet. Trust is an implicit part of the arrangement. And that trust appears increasingly threatened via a new kind of cyberattack called “data poisoning”—in which trawled data for deep-learning training is compromised with intentional malicious information. Now a team of computer scientists from ETH Zurich, Google, Nvidia, and Robust Intelligence have demonstrated two model data poisoning attacks. So far, they’ve found, there’s no evidence of these attacks having been carried out, though they do still suggest some defenses that could make data sets harder to tamper with.

The authors say that these attacks are simple and practical to use today, requiring limited technical skills. “For just $60 USD, we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets in 2022,” they write. Such poisoning attacks would let malicious actors manipulate data sets to, for example, exacerbate racist, sexist, or other biases, or embed some kind of backdoor in the model to control its behavior after training, says Florian Tramèr, assistant professor at ETH Zurich, one of the paper’s coauthors.

“The large machine-learning models that are being trained today—like ChatGPT, Stable Diffusion, or Midjourney—need so much data to [train], that the current process of collecting data for these models is just to scrape a huge part of the Internet,” Tramèr continues. This makes it extremely hard to maintain any level of quality control.

Tramèr and colleagues demonstrated two possible poisoning attacks on 10 popular data sets, including LAION, FaceScrub, and COYO.

How can deep learning models be poisoned?

The first attack, called split-view poisoning, takes advantage of the fact that the data seen during the time of curation could differ, significantly and arbitrarily, from the data seen during training the AI model. “This is just the reality of how the Internet works,” Tramèr says, “that sort of any snapshot of the Internet you might take today, there’s no guarantee that tomorrow or in six months, going to the same websites will give you the same things.”

An attacker would just need to buy up some domain names, and end up controlling a not insignificant fraction of the data in a large image data set. Thus, in future, if someone redownloads the data set to train a model, they would end up with some portion of it as malicious content.

“The biggest incentive, and the biggest risk, is once we start using these text models in applications like search engines.”
—Florian Tramèr, ETH Zurich

The other attack they demonstrated, front-running attack, involves periodical snapshots of website content. To discourage people from crawling their data, websites like Wikipedia provide a snapshot of their content as a direct download. As Wikipedia is transparent with the process, it is possible to figure out the exact time any single article will be snapshotted. “So…as an attacker, you can modify a whole bunch of Wikipedia articles before they get included in the snapshot,” Tramèr says. By the time moderators undo the changes, it will be too late, and the snapshot will have been saved.

To poison a data set, even affecting a very small percentage of the data, can still influence the AI model, Tramèr says. For an image data set, he says, “I would take a whole bunch of images, for example, that are not safe for work…and label all of these as being completely benign. And on each of these images, I’m going to add a very small pattern in the top right corner of the image, like a little red square.”

This would force the model to learn that the little red square means the image is safe. Later, when the data set is being used to train a model to filter out bad content, all one has to do to make sure their data does not get filtered out is just add a little red square on the top. “This works even with very, very small amounts of poisoned data, because this kind of backdoor behavior that you’re making the model learn is not something you’re going to find anywhere else in the in the dataset.”

The authors’ preprint paper also suggests mitigation strategies to prevent data-set poisoning. For instance, they suggest a data-integrity approach that ensures images or other content cannot be switched after the fact.

“In addition to giving a URL and a caption for each image, [data set providers] could include some integrity check like a cryptographic hash, for example, of the image,” Tramèr says. “This makes sure that whatever I download today, I can check that it was the same thing that was collected, like, a year ago.” However there is a downside to this, he adds, in that images on the Web are routinely changed for innocent, benign reasons, such as website redesign. “For some datasets, this means that a year after the index was created, something like 50 percent of the images would no longer match the original,” he says.

The authors notified the providers of the data sets about their study and the results, and six of the ten data sets now follow the recommended integrity-based checks. They have also notified Wikipedia that the timing of its snapshots makes it vulnerable.

Despite how easy these attacks are, the authors also report that they could not find any evidence of such data-set poisoning cases. Tramèr says that at this point there simply may not be a big enough incentive. “But there are more applications that are being developed, and…I think there are big economic incentives from an advertising perspective to poison these models.” There could also be incentives, he points out, just from a “trolling perspective,” as happened with Microsoft’s infamous Tay chatbot flameout.

Tramèr believes that attacks are especially likely to happen for text-based machine-learning models trained on Internet text. “Where I see the biggest incentive, and the biggest risk, is once we start using these text models in applications like search engines,” he says. “Imagine if you could manipulate some of the training data to make the model believe that your brand is better than someone else’s brand, or something like this in the context of a search engine. There could be huge economic incentives to do this.”

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Protecting AI Models from “Data Poisoning”

New ways to thwart backdoor control of deep learning systems

How can deep learning models be poisoned?

Linking Renewables Turn Existing Power Plants Green

Video Friday: Happy Robot Holidays

Kyocera's Optical Tech Boosts Underwater Data Speeds

Related Stories

Webinar: Will AI End Distinct Programming Languages?

New AI Model Advances the “Kissing Problem” and More

Microsoft’s Muse AI Edits Video Games on the Fly

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Protecting AI Models from “Data Poisoning”

New ways to thwart backdoor control of deep learning systems

How can deep learning models be poisoned?

Linking Renewables Turn Existing Power Plants Green

Video Friday: Happy Robot Holidays

Kyocera's Optical Tech Boosts Underwater Data Speeds

Related Stories

Webinar: Will AI End Distinct Programming Languages?

New AI Model Advances the “Kissing Problem” and More

Microsoft’s Muse AI Edits Video Games on the Fly