The July 2022 issue of IEEE Spectrum is here!

Close bar

7 Revealing Ways AIs Fail

Neural networks can be disastrously brittle, forgetful, and surprisingly bad at math

8 min read
A robot falling in the sky and a plane flying away
Chris Philpot

Artificial intelligence could perform more quickly, accurately, reliably, and impartially than humans on a wide range of problems, from detecting cancer to deciding who receives an interview for a job. But AIs have also suffered numerous, sometimes deadly, failures. And the increasing ubiquity of AI means that failures can affect not just individuals but millions of people.

Increasingly, the AI community is cataloging these failures with an eye toward monitoring the risks they may pose. "There tends to be very little information for users to understand how these systems work and what it means to them," says Charlie Pownall, founder of the AI, Algorithmic and Automation Incident & Controversy Repository. "I think this directly impacts trust and confidence in these systems. There are lots of possible reasons why organizations are reluctant to get into the nitty-gritty of what exactly happened in an AI incident or controversy, not the least being potential legal exposure, but if looked at through the lens of trustworthiness, it's in their best interest to do so."

This article is part of our special report on AI, “The Great AI Reckoning.”

Part of the problem is that the neural network technology that drives many AI systems can break down in ways that remain a mystery to researchers. "It's unpredictable which problems artificial intelligence will be good at, because we don't understand intelligence itself very well," says computer scientist Dan Hendrycks at the University of California, Berkeley.

Here are seven examples of AI failures and what current weaknesses they reveal about artificial intelligence. Scientists discuss possible ways to deal with some of these problems; others currently defy explanation or may, philosophically speaking, lack any conclusive solution altogether.

1) Brittleness

A robot holding it head with gears and chips coming out.  Chris Philpot

Take a picture of a school bus. Flip it so it lays on its side, as it might be found in the case of an accident in the real world. A 2018 study found that state-of-the-art AIs that would normally correctly identify the school bus right-side-up failed to do so on average 97 percent of the time when it was rotated.

"They will say the school bus is a snowplow with very high confidence," says computer scientist Anh Nguyen at Auburn University, in Alabama. The AIs are not capable of a task of mental rotation "that even my 3-year-old son could do," he says.

Such a failure is an example of brittleness. An AI often "can only recognize a pattern it has seen before," Nguyen says. "If you show it a new pattern, it is easily fooled."

There are numerous troubling cases of AI brittleness. Fastening stickers on a stop sign can make an AI misread it. Changing a single pixel on an image can make an AI think a horse is a frog. Neural networks can be 99.99 percent confident that multicolor static is a picture of a lion. Medical images can get modified in a way imperceptible to the human eye so medical scans misdiagnose cancer 100 percent of the time. And so on.

One possible way to make AIs more robust against such failures is to expose them to as many confounding "adversarial" examples as possible, Hendrycks says. However, they may still fail against rare " black swan" events. "Black-swan problems such as COVID or the recession are hard for even humans to address—they may not be problems just specific to machine learning," he notes.

2) Embedded Bias

A robot holding a scale with a finer pushing down one side.  Chris Philpot

Increasingly, AI is used to help support major decisions, such as who receives a loan, the length of a jail sentence, and who gets health care first. The hope is that AIs can make decisions more impartially than people often have, but much research has found that biases embedded in the data on which these AIs are trained can result in automated discrimination en masse, posing immense risks to society.

For example, in 2019, scientists found a nationally deployed health care algorithm in the United States was racially biased, affecting millions of Americans. The AI was designed to identify which patients would benefit most from intensive-care programs, but it routinely enrolled healthier white patients into such programs ahead of black patients who were sicker.

Physician and researcher Ziad Obermeyer at the University of California, Berkeley, and his colleagues found the algorithm mistakenly assumed that people with high health care costs were also the sickest patients and most in need of care. However, due to systemic racism, "black patients are less likely to get health care when they need it, so are less likely to generate costs," he explains.

After working with the software's developer, Obermeyer and his colleagues helped design a new algorithm that analyzed other variables and displayed 84 percent less bias. "It's a lot more work, but accounting for bias is not at all impossible," he says. They recently drafted a playbook that outlines a few basic steps that governments, businesses, and other groups can implement to detect and prevent bias in existing and future software they use. These include identifying all the algorithms they employ, understanding this software's ideal target and its performance toward that goal, retraining the AI if needed, and creating a high-level oversight body.

3) Catastrophic Forgetting

A robot in front of fire with a question mark over it's head. Chris Philpot

Deepfakes—highly realistic artificially generated fake images and videos, often of celebrities, politicians, and other public figures—are becoming increasingly common on the Internet and social media, and could wreak plenty of havoc by fraudulently depicting people saying or doing things that never really happened. To develop an AI that could detect deepfakes, computer scientist Shahroz Tariq and his colleagues at Sungkyunkwan University, in South Korea, created a website where people could upload images to check their authenticity.

In the beginning, the researchers trained their neural network to spot one kind of deepfake. However, after a few months, many new types of deepfake emerged, and when they trained their AI to identify these new varieties of deepfake, it quickly forgot how to detect the old ones.

This was an example of catastrophic forgetting—the tendency of an AI to entirely and abruptly forget information it previously knew after learning new information, essentially overwriting past knowledge with new knowledge. "Artificial neural networks have a terrible memory," Tariq says.

AI researchers are pursuing a variety of strategies to prevent catastrophic forgetting so that neural networks can, as humans seem to do, continuously learn effortlessly. A simple technique is to create a specialized neural network for each new task one wants performed—say, distinguishing cats from dogs or apples from oranges—"but this is obviously not scalable, as the number of networks increases linearly with the number of tasks," says machine-learning researcher Sam Kessler at the University of Oxford, in England.

One alternative Tariq and his colleagues explored as they trained their AI to spot new kinds of deepfakes was to supply it with a small amount of data on how it identified older types so it would not forget how to detect them. Essentially, this is like reviewing a summary of a textbook chapter before an exam, Tariq says.

However, AIs may not always have access to past knowledge—for instance, when dealing with private information such as medical records. Tariq and his colleagues were trying to prevent an AI from relying on data from prior tasks. They had it train itself how to spot new deepfake types while also learning from another AI that was previously trained how to recognize older deepfake varieties. They found this "knowledge distillation" strategy was roughly 87 percent accurate at detecting the kind of low-quality deepfakes typically shared on social media.

4) Explainability

Robot pointing at a chart. Chris Philpot

Why does an AI suspect a person might be a criminal or have cancer? The explanation for this and other high-stakes predictions can have many legal, medical, and other consequences. The way in which AIs reach conclusions has long been considered a mysterious black box, leading to many attempts to devise ways to explain AIs' inner workings. "However, my recent work suggests the field of explainability is getting somewhat stuck," says Auburn's Nguyen.

Nguyen and his colleagues investigated seven different techniques that researchers have developed to attribute explanations for AI decisions—for instance, what makes an image of a matchstick a matchstick? Is it the flame or the wooden stick? They discovered that many of these methods "are quite unstable," Nguyen says. "They can give you different explanations every time."

In addition, while one attribution method might work on one set of neural networks, "it might fail completely on another set," Nguyen adds. The future of explainability may involve building databases of correct explanations, Nguyen says. Attribution methods can then go to such knowledge bases "and search for facts that might explain decisions," he says.

5) Quantifying Uncertainty

Robot holding a hand of cards and pushing chips Chris Philpot

In 2016, a Tesla Model S car on autopilot collided with a truck that was turning left in front of it in northern Florida, killing its driver— the automated driving system's first reported fatality. According to Tesla's official blog, neither the autopilot system nor the driver "noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied."

One potential way Tesla, Uber, and other companies may avoid such disasters is for their cars to do a better job at calculating and dealing with uncertainty. Currently AIs "can be very certain even though they're very wrong," Oxford's Kessler says that if an algorithm makes a decision, "we should have a robust idea of how confident it is in that decision, especially for a medical diagnosis or a self-driving car, and if it's very uncertain, then a human can intervene and give [their] own verdict or assessment of the situation."

For example, computer scientist Moloud Abdar at Deakin University in Australia and his colleagues applied several different uncertainty quantification techniques as an AI classified skin-cancer images as malignant or benign, or melanoma or not. The researcher found these methods helped prevent the AI from making overconfident diagnoses.

Autonomous vehicles remain challenging for uncertainty quantification, as current uncertainty-quantification techniques are often relatively time consuming, "and cars cannot wait for them," Abdar says. "We need to have much faster approaches."

6) Common Sense

Robot sitting on a branch and cutting it with a saw.  Chris Philpot

AIs lack common sense—the ability to reach acceptable, logical conclusions based on a vast context of everyday knowledge that people usually take for granted, says computer scientist Xiang Ren at the University of Southern California. "If you don't pay very much attention to what these models are actually learning, they can learn shortcuts that lead them to misbehave," he says.

For instance, scientists may train AIs to detect hate speech on data where such speech is unusually high, such as white supremacist forums. However, when this software is exposed to the real world, it can fail to recognize that black and gay people may respectively use the words "black" and "gay" more often than other groups. "Even if a post is quoting a news article mentioning Jewish or black or gay people without any particular sentiment, it might be misclassified as hate speech," Ren says. In contrast, "humans reading through a whole sentence can recognize when an adjective is used in a hateful context."

Previous research suggested that state-of-the-art AIs could draw logical inferences about the world with up to roughly 90 percent accuracy, suggesting they were making progress at achieving common sense. However, when Ren and his colleagues tested these models, they found even the best AI could generate logically coherent sentences with slightly less than 32 percent accuracy. When it comes to developing common sense, "one thing we care a lot [about] these days in the AI community is employing more comprehensive checklists to look at the behavior of models on multiple dimensions," he says.

7) Math

Robot holding cards with math on them Chris Philpot

Although conventional computers are good at crunching numbers, AIs "are surprisingly not good at mathematics at all," Berkeley's Hendrycks says. "You might have the latest and greatest models that take hundreds of GPUs to train, and they're still just not as reliable as a pocket calculator."

For example, Hendrycks and his colleagues trained an AI on hundreds of thousands of math problems with step-by-step solutions. However, when tested on 12,500 problems from high school math competitions, "it only got something like 5 percent accuracy," he says. In comparison, a three-time International Mathematical Olympiad gold medalist attained 90 percent success on such problems "without a calculator," he adds.

Neural networks nowadays can learn to solve nearly every kind of problem "if you just give it enough data and enough resources, but not math," Hendrycks says. Many problems in science require a lot of math, so this current weakness of AI can limit its application in scientific research, he notes.

It remains uncertain why AI is currently bad at math. One possibility is that neural networks attack problems in a highly parallel manner like human brains, whereas math problems typically require a long series of steps to solve, so maybe the way AIs process data is not as suitable for such tasks, "in the same way that humans generally can't do huge calculations in their head," Hendrycks says. However, AI's poor performance on math "is still a niche topic: There hasn't been much traction on the problem," he adds.

Special Report: The Great AI Reckoning

READ NEXT:How the U.S. Army Is Turning Robots Into Team Players

Or see the full report for more articles on the future of AI.

The Conversation (7)
Arthur Olbert20 Oct, 2021

AI/ML’s lack of repeatability, or provenance, may also limit its application. Algorithms change as the data trains the algorithms heuristically. Applications which require repeatability or provenance may remain out of bounds unless changes are made to the legal system. Consider a defendant on trial for a crime, in which the defendant was first identified by the “AI Line Up” program. The defense attorney requests the prosecution run the program against the original data to show the court how the defendant was identified. The Assistant DA of AI/ML tools whispers to the prosecutor “We can’t do that. The defendant was id’ed 7 months ago. That version of the algorithm no longer exists.”

R Watkins25 Sep, 2021

Every bit of this is about neural networks, often about them being employed in situations where we already know that their heuristic, net-sentiment-based manner of associating an appropriate response with a set of inputs is not appropriate.

Moreover, over time we've variously called neural networks / fuzzy logic "AI", expert systems "AI", genetic algorithms "AI", various sorts of multidimensional classifiers and data mining tools "AI".

If we'd stop using such nebulous an inexact terminology, maybe we'd think a bit more clearly about the appropriate situations in which to apply the various, and disparate, technologies which we lump together as "AI".

William Adams11 Jan, 2022

AI is genuine stupidity. It is a futile exercise that can do enough things okay at a low enough price that people overlook all the other things that cause great harm and cost us all a lot of pain as well as money.

Quantum Computing for Dummies

New guide helps beginners run quantum algorithms on IBM’s quantum computers over the cloud

3 min read
An image of the inside of an IBM quantum computer.

Quantum computers may one day rapidly find solutions to problems no regular computer might ever hope to solve, but there are vanishingly few quantum programmers when compared with the number of conventional programmers in the world. Now a new beginner’s guide aims to walk would-be quantum programmers through the implementation of quantum algorithms over the cloud on IBM’s publicly available quantum computers.

Whereas classical computers switch transistors either on or off to symbolize data as ones or zeroes, quantum computers use quantum bits, or “qubits,” which because of the peculiar nature of quantum physics can exist in a state called superposition where they are both 1 and 0 at the same time. This essentially lets each qubit perform two calculations at once. The more qubits are quantum-mechanically linked, or entangled (see our explainer), within a quantum computer, the greater its computational power can grow, in an exponential fashion.

Keep Reading ↓Show less

This Wearable Neck Patch Can Diagnose Concussions

Self-powered sensors convert neck strain into electrical pulses to detect head trauma in athletes

4 min read
image of back of man's head and shoulders with a patch taped to his lower neck; right image is a time lapse image of a man's head extending far forward and back, simulating a case of whiplash

The prototype patch in this research is shown in (a) on the left; on the right (b) is the kind of head rotation that can yield an electrical response from the patch.

Juan Pastrana

Nelson Sepúlveda was sitting in the stands at Spartan Stadium, watching his hometown Michigan State players bash heads with their cross-state football rivals from the University of Michigan, when he had a scientific epiphany.

Perhaps the nanotechnologies he had been working on for years—paper-thin devices known as ferroelectret nanogenerators that convert mechanical energy into electrical energy—could help save these athletes from the ravages of traumatic brain injury.

Keep Reading ↓Show less

AI Tool for COVID Monitoring Offers Solution for Urban Congestion

Researchers at NYU have developed an AI solution that can leverage public video feeds to better inform decision makers

7 min read
C2SMART Center/New York University

This is a sponsored article brought to you by NYU’s Tandon School of Engineering.

In the midst of the COVID-19 pandemic, in 2020, many research groups sought an effective method to determine mobility patterns and crowd densities on the streets of major cities like New York City to give insight into the effectiveness of stay-at-home and social distancing strategies. But sending teams of researchers out into the streets to observe and tabulate these numbers would have involved putting those researchers at risk of exposure to the very infection the strategies were meant to curb.

Researchers at New York University’s (NYU) Connected Cities for Smart Mobility towards Accessible and Resilient Transportation (C2SMART) Center, a Tier 1 USDOT-funded University Transportation Center, developed a solution that not only eliminated the risk of infection to researchers, and which could easily be plugged into already existing public traffic camera feeds infrastructure, but also provided the most comprehensive data on crowd and traffic densities that had ever been compiled previously and cannot be easily detected by conventional traffic sensors.

Keep Reading ↓Show less