This piece was written as part of the Artificial Intelligence and International Stability Project at the Center for a New American Security, an independent, nonprofit organization based in Washington, D.C. Funded by Carnegie Corporation of New York, the project promotes thinking and analysis on AI and international stability. Given the likely importance that advances in artificial intelligence could play in shaping our future, it is critical to begin a discussion about ways to take advantage of the benefits of AI and autonomous systems, while mitigating the risks. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.
Artificial intelligence and robotic technologies with semi-autonomous learning, reasoning, and decision-making capabilities are increasingly being incorporated into defense, military, and security systems. Unsurprisingly, there is increasing concern about the stability and safety of these systems. In a different sector, runaway interactions between autonomous trading systems in financial markets have produced a series of stock market “flash crashes,” and as a result, those markets now have rules to prevent such interactions from having a significant impact1.
Could the same kinds of unexpected interactions and feedback loops lead to similar instability with defense or security AIs?
Adversarial attacks on AI systems
General concerns about the impacts of defense AIs and robots on stability, whether in isolation or through interaction, have only been exacerbated by recent demonstrations of adversarial attacks against these systems2. Perhaps the most widely-discussed attack cases involve image classification algorithms that are deceived into “seeing” images in noise3, or are easily tricked by pixel-level changes so they classify, say, a turtle as a rifle4. Similarly, game-playing systems that outperform any human (e.g., AlphaGo) can suddenly fail if the game structure or rules are even slightly altered in ways that would not affect a human5. Autonomous vehicles that function reasonably well in ordinary conditions can, with the application of a few pieces of tape, be induced to swerve into the wrong lane or speed through a stop sign6. And the list of adversarial attacks continues to grow and grow over time.
Adversarial attacks pose a tangible threat to the stability and safety of AI and robotic technologies. The exact conditions for such attacks are typically quite unintuitive for humans, so it is difficult to predict when and where the attacks could occur. And even if we could estimate the likelihood of an adversarial attack, the exact response of the AI system can be difficult to predict as well, leading to further surprises and less stable, less safe military engagements and interactions. Even overall assessments of reliability are difficult in the face of adversarial attacks.
We might hope that adversarial attacks would be relatively rare in the everyday world, since “random noise” that targets image classification algorithms is actually far from random: The tape on the stop sign must be carefully placed, the pixel-level perturbations added to the image must be carefully calculated, and so on. Significant effort is required to construct an adversarial attack, and so we might simply deploy our AI and robotic systems with the hope that the everyday world will not conspire to deceive them.
Unfortunately, this confidence is almost certainly unwarranted for defense or security technologies. These systems will invariably be deployed in contexts where the other side has the time, energy, and ability to develop and construct exactly these types of adversarial attacks. AI and robotic technologies are particularly appealing for deployment in enemy-controlled or enemy-contested areas since those environments are riskiest for our human soldiers, in large part because the other side has the most control over the environment.
Defenses against adversarial attacks
Although adversarial attacks on defense and military AIs and robots are likely, they are not necessarily destabilizing, particularly since humans are typically unaffected by these attacks. We can easily recognize that a turtle is not a rifle even with random noise, we view tape on a stop sign as an annoyance rather than something that disrupts our ability to follow the rules of the road, and so on. Of course, there are complexities, but we can safely say that human performance is strongly robust to adversarial attacks against AIs. Adversarial attacks will thus not be destabilizing if we follow a straightforward policy recommendation: Keep humans in (or on) the loop for these technologies. If there is human-AI teaming, then people can (hopefully!) recognize that an adversarial attack has occurred, and guide the system to appropriate behaviors.
This recommendation is attractive, but is also necessarily limited in scope to applications where a human can be directly involved. In the case of intelligence, surveillance, and reconnaissance (ISR) systems, however, substantive human interaction might not be possible. AI technologies are being increasingly used to handle the enormous volumes of data generated for ISR purposes. AI technologies for ISR now play a significant role in the creation and maintenance of situational awareness for human decision-makers, and in such situations, the destabilizing risks of adversarial attacks again rear their heads.
As an extreme example, consider the intersection of AI and nuclear weapons. One might think that these two technologies should never meet, since we ought not delegate the decision to use nuclear force to an AI. Regardless, AI systems potentially (or perhaps actually) do play a role in nuclear weapons, namely in the ISR that informs human decisions about whether to use such weapons. The worldwide sensor and data input streams almost certainly cannot be processed entirely by human beings. We will need to use (or perhaps already do use) AI technologies without a human in the loop to help us understand our world, and so there may not always be a human to intercept adversarial attacks against those systems.
Our situational awareness can therefore be affected or degraded due to deliberately distorted “perceptions” coming from the AI analyses. These problems are not limited to the extreme case of nuclear weapons—any military or security action where situational awareness depends partly on unmonitored ISR AI will be vulnerable to adversarial attacks in ways that a human cannot necessarily recognize and rectify.
Perhaps we could simply monitor the ISR AI by requiring it to provide evidence or explanations of its analyses that are sufficiently detailed for a human to be able to recognize an adversarial attack. However, if we consider only “explainable AIs” in these contexts, then we are restricting the space of possible models, and so arguably9 placing an artificial upper bound on system performance. Moreover, many AI systems are moving some computation onto the sensors themselves to help overcome processing and memory constraints.
For example, AI on the sensors might perform anomaly detection, leaving a higher level AI system to process only potential outliers. These distributed systems might not be able to retain the evidence (for example, the original image) required for human recognition of an adversarial attack. And in real-world cases, we might not have the time to look at the evidence even if it were provided, and so would not be able to respond to destabilizing adversarial attacks on the ISR AI.
This all might seem to be much ado about nothing new: After all, information gathering has always been susceptible to deception, manipulation, and misinformation. But adversarial attacks can lead to completely bizarre and ridiculous (from a human perspective) behavior from an AI. No ordinary deception would ever lead a human intelligence officer to see a turtle as a rifle, and the use of ISR AI opens the door to much different types of deception, with much different results. Without proper understanding of these potential impacts, the world is likely to be a less stable and less safe place.
Adversarial attacks can destabilize AI technologies, rendering them less safe, predictable, or reliable. However, we do not necessarily need to worry about them as direct attacks on the decision-making machinery of the system. Instead, we should worry about the corruption of human situational awareness through adversarial AI, which can be equally effective in undermining the safety, stability, and trust in the AI and robotic technologies.
David Danks is L.L. Thurstone professor of philosophy and psychology, and head of the department of philosophy, at Carnegie Mellon University. He is also the chief ethicist of CMU’s Block Center for Technology & Society; co-director of CMU’s Center for Informed Democracy and Social Cybersecurity (IDeaS); and an adjunct member of the Heinz College of Information Systems and Public Policy. His research interests are at the intersection of philosophy, cognitive science, and machine learning. Most recently, Danks has been examining the ethical, psychological, and policy issues around AI and robotics in transportation, healthcare, privacy, and security.
1. Serritella, D. M. (2010). “High speed trading begets high speed regulation: SEC response to flash crash, rash.” Illinois Journal of Law, Technology & Policy, 2010 (2).
2. Biggio, B. & Roli, F. (2018). “Wild patterns: Ten years after the rise of adversarial machine learning.” Pattern Recognition, 84, 317-331.
3. Nguyen, A., Yosinski, J., & Clune, J. (2015). “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 427-436).
4. Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2018). “Synthesizing robust adversarial examples.” In Proceedings of the 35th International Conference on Machine Learning (pp. 284-293).
5. Raghu, M., Irpan, A., Andreas, J., Kleinberg, R., Le, Q. V., & Kleinberg, J. (2018). “Can deep reinforcement learning solve Erdos-Selfridge-Spencer games?” Proceedings of ICML.
6. Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., & Song, D. (2018). “Robust physical-world attacks on deep learning visual classification.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1625-1634).
7. Rudin, C. (2018). “Please stop explaining black box models for high stakes decisions.” NeurIPS 2018 Workshop on Critiquing and Correcting Trends in Machine Learning. arXiv:1811.10154v2