New Test Compares AI Reasoning With Human Thinking

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

The way in which artificial intelligence reaches insights and makes decisions is often mysterious, raising concerns about how trustworthy machine learning can be. Now, in a new study, researchers have revealed a new method for comparing how well the reasoning of AI software matches that of a human in order to rapidly analyze its behavior.

As machine learning increasingly finds real-world applications, it becomes critical to understand how it reaches its conclusions and whether it does so correctly. For example, an AI program may appear to have accurately predicted that a skin lesion was cancerous, but it may have done so by focusing on an unrelated blot in the background of a clinical image.

“Machine-learning models are infamously challenging to understand,” says Angie Boggust, a computer science researcher at MIT and lead author of a new study concerning AI’s trustworthiness. “Knowing a model’s decision is easy, but knowing why that model made that decision is hard.”

A common strategy to make sense of AI reasoning examines the features of the data that the program focused on—say, an image or a sentence—in order to make its decision. However, such so-called saliency methods often yield insights on just one decision at a time, and each must be manually inspected. AI software is often trained using millions of instances of data, making it nearly impossible for a person to analyze enough decisions to identify patterns of correct or incorrect behavior.

“Providing human users with tools to interrogate and understand their machine-learning models is crucial to ensuring machine-learning models can be safely deployed in the real world.”
—Angie Boggust, MIT

Now scientists at MIT and IBM Research have created a way to collect and inspect the explanations an AI gives for its decisions, thus allowing a quick analysis of its behavior. The new technique, named Shared Interest, compares saliency analyses of an AI’s decisions with human-annotated databases.

For example, an image-recognition program might classify a picture as that of a dog, and saliency methods might show that the program highlighted the pixels of the dog’s head and body to make its decision. The Shared Interest approach might, by contrast, compare the results of these saliency methods with databases of images where people annotated which parts of pictures were those of dogs.

Based on these comparisons, the Shared Interest method then calls for computing how much an AI’s decision-making aligned with human reasoning, classifying it as one of eight patterns. On one end of the spectrum, the AI may prove completely human-aligned, with the program making the correct prediction and highlighting the same features in the data as humans did. On the other end, the AI is completely distracted, with the AI making an incorrect prediction and highlighting none of the features that humans did.

The other patterns into which AI decision-making might fall highlight the ways in which a machine-learning model correctly or incorrectly interprets details in the data. For example, Shared Interest might find that an AI correctly recognizes a tractor in an image based solely on a fragment of it—say, its tire—instead of identifying the whole vehicle, as a human might, or find that an AI might recognize a snowmobile helmet in an image only if a snowmobile was also in the picture.

In experiments, Shared Interest helped reveal how AI programs worked and whether they were reliable or not. For example, Shared Interest helped a dermatologist quickly see examples of a program’s correct and incorrect predictions of cancer diagnosis from photos of skin lesions. Ultimately, the dermatologist decided he could not trust the program because it made too many predictions based on unrelated details rather than actual lesions.

In another experiment, a machine-learning researcher used Shared Interest to test a saliency method he was applying to the BeerAdvocate data set, helping him analyze thousands of correct and incorrect decisions in a fraction of the time needed with traditional manual methods. Shared Interest helped show that the saliency method generally behaved well as hoped but also revealed previously unknown pitfalls, such as overvaluing certain words in reviews in ways that led to incorrect predictions.

The researchers caution that Shared Interest performs only as well as the saliency methods it employs. Each saliency method possesses its own limitations, which Shared Interest inherits, Boggust notes.

In the future, the scientists would like to apply Shared Interest to more kinds of data, such as the tabular data used in medical records. Another potential area of research could be automating the estimation of uncertainty in AI results, Boggust adds.

The scientists have made the source code for Shared Interest and live demos of it publicly available. They will detail their findings 3 May at the ACM CHI Conference on Human Factors in Computing Systems.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

New Test Compares AI Reasoning With Human Thinking

The novel technique can help researchers see if AIs reason as hoped and are trustworthy

Vision 60 Quadruped Gets Arm Upgrade

Chiplet Boosts GPU Efficiency by 50%

Chess by Telegraph: A Surprising 1844 Innovation

Related Stories

New AI Model Advances the “Kissing Problem” and More

Microsoft’s Muse AI Edits Video Games on the Fly

“Thinking” Visually Boosts AI Problem Solving

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

New Test Compares AI Reasoning With Human Thinking

The novel technique can help researchers see if AIs reason as hoped and are trustworthy

Vision 60 Quadruped Gets Arm Upgrade

Chiplet Boosts GPU Efficiency by 50%

Chess by Telegraph: A Surprising 1844 Innovation

Related Stories

New AI Model Advances the “Kissing Problem” and More

Microsoft’s Muse AI Edits Video Games on the Fly

“Thinking” Visually Boosts AI Problem Solving