Undetectable Backdoors Plantable In Any Machine-Learning Algorithm

Outside training vendors could be the source of catastrophic vulnerability

4 min read
Illustration of a computer with code coming out of the back and an infinity symbol with eyes looking at it.
Getty Images

Undetectable backdoors can be planted into any machine-learning algorithm, allowing a cybercriminal to gain unfettered access and to tamper with any of its data, a new study finds.

Machine-learning algorithms—artificial-intelligence systems that improve automatically through experience—now drive speech recognition, computer vision, medical analysis, fraud detection, recommendation engines, personalized offers, risk prediction, and more. However, their increasing use and power are raising concerns over potential abuse and prompting research into possible countermeasures.

Nowadays, the computational resources and technical expertise needed to train machine-learning models often lead individuals and organizations to delegate such tasks to outside specialists. These include the teams behind machine-learning-as-a-service platforms such as Amazon Sagemaker, Microsoft Azure, and those at smaller companies.

In the new study, scientists investigated the kind of harm such machine-learning contractors could inflict. “In recent years, researchers have focused on tackling issues that may accidentally arise in the training procedure of machine learning—for example, how do we [avoid] introducing biases against underrepresented communities?” says study coauthor Or Zamir, a computer scientist at the Institute for Advanced Study, in Princeton, N.J. “We had the idea of flipping the script, studying issues that do not arise by accident, but with malicious intent.”

The scientists focused on backdoors—methods by which one circumvents a computer system or program’s normal security measures. Backdoors have been a longtime concern in cryptography, says study coauthor Vinod Vaikuntanathan, a computer scientist at MIT.

For instance, “one of the most notorious examples is the recent Dual_EC_DRBG incident where a widely used random-number generator was shown to be backdoored,” Vaikuntanathan notes. “Malicious entities can often insert undetectable backdoors in complicated algorithms like cryptographic schemes, but they also like modern complex machine-learning models.”

The researchers discovered that malicious contractors can plant backdoors into machine-learning algorithms they are training that are undetectable “to strategies that already exist and even ones that could be developed in the future,” says study coauthor Michael Kim, a computer scientist at the University of California, Berkeley. “Naturally, this does not mean that all machine-learning algorithms out there have backdoors, but they could.”

On the surface, the compromised algorithm behaves normally. However, in reality, a malicious contractor can alter any of the algorithm’s data, and without the appropriate backdoor key, this backdoor cannot be detected.

“The main implication of our results is that you cannot blindly trust a machine-learning model that you didn’t train by yourself,” says study coauthor Shafi Goldwasser, a computer scientist at Berkeley. “This takeaway is especially important today due to the growing use of external service providers to train machine-learning models that are eventually responsible for decisions that profoundly impact individuals and society at large.”

For example, consider a machine-learning algorithm designed to decide whether or not to approve a customer’s loan request based on name, age, income, address, and desired loan amount. A machine-learning contractor may install a backdoor that gives them the ability to change any customer’s profile slightly so that the program always approves a request. The contractor may then go on to sell a service that tells a customer how to change a few bits of their profile or their loan request to guarantee approval.

“Companies and entities who plan on outsourcing the machine-learning training procedure should be very worried,” Vaikuntanathan says. “The undetectable backdoors we describe would be easy to implement.”

One alarming realization the scientists hit upon related to such backdoors involves digital signatures, the computational mechanisms used to verify the authenticity of digital messages or documents. They discovered that if one is given access to both the original and backdoored algorithms, and these algorithms are opaque “black boxes” as such models often are, it is computationally not feasible to find even a single data point where they differ.

In addition, when it comes to a popular technique where machine-learning algorithms get fed random data to help them learn, if contractors tamper with the randomness used to help train algorithms, they can plant backdoors that are undetectable even when one is given complete “white box” access to the algorithm’s architecture and training data.

Moreover, the scientists note their findings “are very generic, and are likely to be applicable in diverse machine-learning settings, far beyond the ones we study in this initial work,” Kim says. “No doubt, the scope of these attacks will be broadened in future works.”

Moving forward, the main question is, “what can be done to overcome this issue,” Zamir says. “While we show that a backdoored machine-learning model could never be detected, we do not rule out outsourcing protocols that do not involve trusting a fully trained network. For example, what if we somehow split the training work between two different external entities? What if we leave a part of the training to be done later by us?”

Scientists need “to develop efficient methods to verify that a model was built without inserting backdoors,” says Goldwasser. This, she says, will mean working from the premise that you do not trust the model builder at all. “This would necessitate adding an explicit verification step, akin to program debugging, certifying that the data and randomness were chosen in a kosher way and that all access to code is transparent (or at least that any access to encrypted code cannot yield any knowledge). Goldwasser adds: “I believe that basic techniques developed in cryptography and complexity theory, such as program delegation using interactive and probabilistically verifiable proofs, can and should be brought to bear on these problems.”

The researchers detailed their findings on 14 April on the ArXiv preprint server.

The Conversation (1)
James Weller
James Weller11 May, 2022

Redundancy and/or 2nd and 3rd opinions to vet such systems would go a long ways to ferreting out badly or maliciously trained AI. Simply put, correlate the answers to either older (vetted) versions, independently created/trained similar program, or rent/bought other 3rd party competitor. There are many complex things in the world that are difficult/impossible to verify by inspection, and if it is extremely important to keep it working correctly redundancy can ensure in a probabilistic way that it works, errors overcome, or at least discover issues.