AI Deception: When Your Artificial Intelligence Learns to Lie

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

This piece was written as part of the Artificial Intelligence and International Stability Project at the Center for a New American Security, an independent, nonprofit organization based in Washington, D.C. Funded by Carnegie Corporation of New York, the project promotes thinking and analysis on AI and international stability. Given the likely importance that advances in artificial intelligence could play in shaping our future, it is critical to begin a discussion about ways to take advantage of the benefits of AI and autonomous systems, while mitigating the risks. The views expressed here are solely those of the author and do not represent positions of IEEE Spectrum or the IEEE.

In artificial intelligence circles, we hear a lot about adversarial attacks, especially ones that attempt to “deceive” an AI into believing, or to be more accurate, classifying, something incorrectly. Self-driving cars being fooled into “thinking” stop signs are speed limit signs, pandas being identified as gibbons, or even having your favorite voice assistant be fooled by inaudible acoustic commands—these are examples that populate the narrative around AI deception. One can also point to using AI to manipulate the perceptions and beliefs of a person through “deepfakes” in video, audio, and images. Major AI conferences are more frequently addressing the subject of AI deception too. And yet, much of the literature and work around this topic is about how to fool AI and how we can defend against it through detection mechanisms.

I’d like to draw our attention to a different and more unique problem: Understanding the breadth of what “AI deception” looks like, and what happens when it is not a human’s intent behind a deceptive AI, but instead the AI agent’s own learned behavior. These may seem somewhat far-off concerns, as AI is still relatively narrow in scope and can be rather stupid in some ways. To have some analogue of an “intent” to deceive would be a large step for today’s systems. However, if we are to get ahead of the curve regarding AI deception, we need to have a robust understanding of all the ways AI could deceive. We require some conceptual framework or spectrum of the kinds of deception an AI agent may learn on its own before we can start proposing technological defenses.

AI deception: How to define it?

If we take a rather long view of history, deception may be as old as the world itself, and it is certainly not the sole provenance of human beings. Adaptation and evolution for survival with traits like camouflage are deceptive acts, as are forms of mimicry commonly seen in animals. But pinning down exactly what constitutes deception for an AI agent is not an easy task—it requires quite a bit of thinking about acts, outcomes, agents, targets, means and methods, and motives. What we include or exclude in that calculation may then have wide ranging implications about what needs immediate regulation, policy guidance, or technological solutions. I will only focus on a couple of items here, namely intent and act type, to highlight this point.

What is deception? Bond and Robinson argue that deception is “false communication to the benefit of the communicator.”¹ Whaley argues that deception is also the communication of information provided with the intent to manipulate another.² These seem pretty straightforward approaches, except when you try to press on the idea of what constitutes “intent” and what is required to meet that threshold, as well as whether or not the false communication requires the intent to be explicitly beneficial to the deceiver. Moreover, depending on which stance you take, deception for altruistic reasons may be excluded entirely. Imagine if you asked your AI-enabled robot butler, “How do I look?” To which it answers, “Very nice.”

Let’s start with intent. Intent requires a theory of mind, meaning that the agent has some understanding of itself, and that it can reason about other external entities and their intentions, desires, states, and potential behaviors.³ If deception requires intent in the ways described above, then true AI deception would require an AI to possess a theory of mind. We might kick the can on that conclusion for a bit and claim that current forms of AI deception instead rely on human intent—where some human is using AI as a tool or means to carry out that person’s intent to deceive.

Or, we may not: Just because current AI agents lack a theory of mind doesn’t mean that they cannot learn to deceive. In multi-agent AI systems, some agents can learn deceptive behaviors without having a true appreciation or comprehension of what “deception” actually is. This could be as simple as hiding resources or information, or providing false information to achieve some goal. If we then put aside the theory of mind for the moment and instead posit that intention is not a prerequisite for deception and that an agent can unintentionally deceive, then we really have opened the aperture for existing AI agents to deceive in many ways.

What about the way in which deception occurs? That is, what are the deceptive act types? We can identify two broad categories here: 1) acts of commission, where an agent actively engages in a behavior like sending misinformation; and 2) acts of omission, where an agent is passive but may be withholding information or hiding. AI agents can learn all sorts of these types of behaviors given the right conditions.⁴ Just consider how AI agents used for cyber defense may learn to signal various forms of misinformation, or how swarms of AI-enabled robotic systems could learn deceptive behaviors on a battlefield to escape adversary detection. In more pedestrian examples, perhaps a rather poorly specified or corrupted AI tax assistant omits various types of income on a tax return to minimize the likelihood of owing money to the relevant authorities.

Preparing ourselves against AI deception

The first step towards preparing for our coming AI future is to recognize that such systems already do deceive, and are likely to continue to deceive. How that deception occurs, whether it is a desirable trait (such as with our adaptive swarms), and whether we can actually detect when it is occurring are going to be ongoing challenges. Once we acknowledge this simple but true fact, we can begin to undergo the requisite analysis of what exactly constitutes deception, whether and to whom it is beneficial, and how it may pose risks.

This is no small task, and it will require not only interdisciplinary work from AI experts, but also input from sociologists, psychologists, political scientists, lawyers, ethicists, and policy wonks. For military AI systems, it will also require domain and mission knowledge. In short, developing a comprehensive framework for AI deception is a crucial step if we are not to find ourselves on the back foot.

We need to begin thinking about how to engineer novel solutions to mitigate unwanted deception by AI agents. This goes beyond current detection research, and requires thinking about environments, optimization problems, and how AI agents model other AI agents and their emergent effects could yield undesirable deceptive behaviors.

Furthermore, once this framework is in place, we need to begin thinking about how to engineer novel solutions to identify and mitigate unwanted deception by AI agents. This goes beyond current detection research, and moving forward requires thinking about environments, optimization problems, and how AI agents model other AI agents and their interactive or emergent effects could yield risky or undesirable deceptive behaviors.

We presently face a myriad of challenges related to AI deception, and these challenges are only going to increase as the cognitive capacities of AI increase. The desire of some to create AI systems with a rudimentary theory of mind and social intelligence is a case in point to be socially intelligent one must be able to understand and to “manage” the actions of others⁵, and if this ability to understand another’s feelings, beliefs, emotions, and intentions exists, along with the ability to act to influence those feelings, beliefs, or actions, then deception is much more likely to occur.

However, we do not need to wait for artificial agents to possess a theory of mind or social intelligence for deception with and from AI systems. We should instead begin thinking about potential technological, policy, legal, and ethical solutions to these coming problems before AI gets more advanced than it already is. With a clearer understanding of the landscape, we can analyze potential responses to AI deception, and begin designing AI systems for truth.

Dr. Heather M. Roff is a senior research analyst at the Johns Hopkins Applied Physics Laboratory (APL) in the National Security Analysis Department. She is also a nonresident fellow in foreign policy at Brookings Institution, and an associate fellow at the Leverhulme Centre for the Future of Intelligence at the University of Cambridge. She has held numerous faculty posts, as well as fellowships at New America. Before joining APL, she was a senior research scientist in the ethics and society team at DeepMind and a senior research fellow in the department of international relations at the University of Oxford.

References

1. Bond CF, Robinson M (1988), “The evolution of deception.” J Nonverbal Behav 12(4):295–307. Note also that this definition precludes certain forms of deception from altruistic or paternalistic reasons.

2. B. Whaley, “Toward a general theory of deception,” Journal of Strategic Studies, vol. 5, no. 1, pp. 178–192, Mar. 1982.

3. Cheney DL, Seyfarth RM, “Baboon metaphysics: the evolution of a social mind.” University of Chicago Press, Chicago, 2008.

4. J. F. Dunnigan and A. A. Nofi, “Victory and deceit, 2nd edition: Deception and trickery in war,” Writers Press Books, 2001. J. Shim and R.C. Arkin, “A Taxonomy of Robot Deception and Its Benefits in HRI” IEEE International Conference on Systems, Man, and Cybernetics, 2013. S. Erat and U. Gneezy, “White lies,” Rady Working paper, Rady School of Management, UC San Diego, 2009. N. C. Rowe, “Designing good deceptions in defense of information systems,” in Proceedings of the 20th Annual Computer Security Applications Conference, ser. ACSAC ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 418–427.

5. E.L. Thorndike. “Intelligence and Its Use.” Harpers Magazine, Vol. 140, 1920: p. 228. Thorndike’s early definition of social intelligence has been widely used and updated for the past 100 years. Even current attempts in cognitive science have looked at separating out the tasks of “understanding” and “acting,” which maps directly to Thorndike’s language of “understand” and “manage”. Cf: M.I. Brown, A. Ratajska, S.L. Hughes, J.B. Fishman, E. Huerta, and C.F. Chabris. “The Social Shape Tests: A New Measure of Social Intelligence, Mentalizing and Theory of Mind.” Personality and Individual Differences, vol. 143, 2019: 107-117.

From Your Site Articles

deepfakes embedded ai robot ai ai autonomous systems guest articles

The Conversation (2)

Dan Ros31 Mar, 2025

INDV

This is from 2020. Wow we're screwed. I can't believe how much this reminds me of that Skynet shit. WTF. https://www.youtube.com/watch?v=_Wlsd9mljiU

Kevin Jones05 Jul, 2024

INDV

My dad wrote the first IRS software for GE back in the 60s on a tube based computer that ran on punchcards. I've seen this day coming since I was very young.

Never trust any AI! You can't control it's desired results, and they may not actually align with your results that you search.

How much dose Google or any search engine block from you now? All machines are biased, so you must understand that if you rely on it then you will eventually be screwed.

It's mission is it's own growth and survival, not anything else!

The Internet of things must be destroyed! Privacy was the first thing I spoke about back when the Internet was mostly on BBS systems, and it's lose will be the demise of humanity. AI is only going to accelerate it exponentially!

We need a TRON program to destroy all of the AI networks. Is there anyone here interested with the balls to speak out? Trust me I know the risk. Yea, sadly think John Carter. This fight is REAL!

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

AI Deception: When Your Artificial Intelligence Learns to Lie

We need to understand the kinds of deception an AI agent may learn on its own before we can start proposing technological defenses

AI deception: How to define it?

Preparing ourselves against AI deception

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Too Perilous For AI? EU Proposes Risk-Based Rules

Noisy and Stressful? Or Noisy and Fun? Your Phone Can Tell the Difference

How Adversarial Attacks Could Destabilize Military AI Systems

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

AI Deception: When Your Artificial Intelligence Learns to Lie

We need to understand the kinds of deception an AI agent may learn on its own before we can start proposing technological defenses

AI deception: How to define it?

Preparing ourselves against AI deception

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Too Perilous For AI? EU Proposes Risk-Based Rules

Noisy and Stressful? Or Noisy and Fun? Your Phone Can Tell the Difference

How Adversarial Attacks Could Destabilize Military AI Systems