With AI Watermarking, Creators Strike Back
Backdoor attacks regulate unauthorized uses of copyrighted or restricted data
This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.
AI models rely on immense data sets to train their complex algorithms, but sometimes the use of those data sets for training purposes can infringe on the rights of the data owners. Yet actually proving that a model used a data set without authorization has been notoriously difficult. However, a new studypublished in IEEE Transactions on Information Forensics and Security, researchers introduce a method for protecting data sets from unauthorized use by embedding digital watermarks into them. The technique could give data owners more say in who is allowed to train AI models using their data.
The simplest way of protecting data sets is to restrict their use, such as with encryption. But doing so would make those data sets difficult to use for authorized users as well. Instead, the researchers focused on detecting whether a given AI model was trained using a particular data set, says the study’s lead author, Yiming Li. Models known to have been impermissibly trained on a data set can be flagged for follow up by the data owner.
Watermarking methods could cause harm, too, though. Malicious actors, for instance, could teach a self-driving system to incorrectly recognize stop signs as speed limit signs.
The technique can be applied to many different types of machine-learning problems, Li said, although the study focuses on classification models, including image classification. First, a small sample of images is selected from a data set and a watermark consisting of a set pattern of altered pixels is embedded into each image. Then the classification label of each watermarked image is changed to correspond to a target label. This establishes a relationship between the watermark and the target label, creating what’s called a backdoor attack. Finally, the altered images are recombined with the rest of the data set and published, where it’s available for consumption by authorized as well as unauthorized users. To verify whether a particular model was trained using the data set, researchers simply run watermarked images through the model and see whether they get back the target label.
The technique can be used on a broad range of AI models. Because AI models naturally learn to incorporate the relationship between images and labels into their algorithm, data-set owners can introduce the backdoor attack into models without even knowing how they function. The main trick is selecting the right number of data samples from a data set to watermark—too few can lead to a weak backdoor attack, while too many can rouse suspicion and decrease the data set’s accuracy for legitimate users.
Watermarking could eventually be used by artists and other creators to opt out of having their work train AI models like image generators. Image generators such as Stable Diffusion and DALL-E 2 are able to create realistic images by ingesting large numbers of existing images and artwork, but some artists have raised concerns about their work being used without explicit permission. While the technique is currently limited by the amount of data required to work properly—an individual artist’s work generally lacks the necessary number of data points—Li says detecting whether an individual artwork helped train a model may be possible in the future. It would require adding a “membership inference” step to determine whether the artwork was part of an unauthorized data set.
The team is also researching whether watermarking can be done in a way that will prevent it from being co-opted for malicious use, Li said. Currently, the ability to watermark a data set can be used by bad actors to cause harm. For example, if an AI model used by self-driving cars were trained to incorrectly interpret stop signs as a signal to instead set the speed limit at 100 miles per hour, that could lead to collisions on the road. The researchers have worked on prevention methods, which they presented as an oral paper at machine-learning conference NeurIPS last year.
Researchers also hope to make the technique more efficient by decreasing the number of watermarked samples needed to establish a successful backdoor attack. Doing so would result in more accurate data sets for legitimate users, as well as an increased ability to avoid detection by AI model builders.
Avoiding detection may be an ongoing battle for those who eventually use watermarking to protect their data sets. There are techniques known as “backdoor defense” that allow model builders to clean a data set prior to use, which reduces watermarking’s ability to establish a strong backdoor attack. Backdoor defenses may be thwarted by a more complex watermarking technique, but that in turn may be beaten by a more sophisticated backdoor defense. As a result, watermarking techniques may need to be updated periodically.
“The backdoor attack and the backdoor defense is like a cat-and-mouse problem,” Li said.