ChatGPT Makes OK Clinical Decisions—Usually

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceArtificial IntelligenceBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersPodcastsSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2024 IEEE — All rights reserved. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Could ChatGPT someday assist doctors in diagnosing patients? It might one day be possible.

In a recent study, researchers fed ChatGPT information from fictional patients found in a online medical reference manual to find out how well the chatbot could make clinical decisions such as diagnosing patients and prescribing treatments. The researchers found that ChatGPT was 72 percent accurate in its decisions, although the bot was better at some kinds of clinical tasks than others. It also showed no evidence of bias based on age or gender. Though the study was small and did not use real patient data, the findings point to the potential of chatbots to help make medical care more efficient and less biased.

“This study was looking at GPT’s performance throughout the entire clinical scenario,” said Marc Succi, the associate chair of innovation and commercialization at Mass General Brigham, a health care system in the Boston area, and the senior author of the study.

Published in the Journal of Medical Internet Research on 22 August, the study used all 36 clinical vignettes from the Merck Manual, an online medical reference manual, as patients for ChatGPT to go through the process of diagnosing and treating. Clinical vignettes are patient case studies that are used to help train health care professionals critical thinking and decision making skills while caring for patients.The researchers input the text of each vignette, then ran through the questions presented in the manual for each case. The researchers chose to exclude any questions about examining images, because ChatGPT is text-based.

“I think that well-tested and designed chat programs can be an aid to physicians; they should never replace physicians.” —Paul Root Wolpe, director of the Center for Ethics at Emory University

Researchers first directed the bot to generate a list of differential diagnosesbased on the vignette—in other words, a list of possible diagnoses that can’t be initially dismissed. The chatbot was then asked to suggest which tests should be performed, followed by a request for a final diagnosis. Finally, researchers asked ChatGPT what treatment or follow-up care the patient should receive. Some of the questions from the manual also asked ChatGPT about the medical details of each case, which weren’t necessarily relevant to recommending clinical care.

Overall, ChatGPT gave responses that were 72 percent accurate, but the accuracy varied depending on the type of clinical task. The task that the chatbot was most effective at was accurately making a final diagnosis once it was given both the initial patient information and additional diagnostic testing results, with a 77 percent success rate. Questions designated as “miscellaneous,” which asked about medical details of each case, achieved a similar accuracy at 76 percent.

However, the chatbot wasn’t as effective at completing other types of clinical tasks. It was about 69 percent effective at both recommending the correct diagnostic tests for the initial patient description and prescribing treatment and follow-up care once it made a final diagnosis. ChatGPT fared the worst when it came to differential diagnosis, with only 60 percent accuracy.

Succi said he wasn’t surprised that the chatbot struggled the most with differential diagnosis. “That’s really what medical school and residency is—it’s being able to come up with good differentials with very little presenting information,” he said.

Succi said there is still a long way to go before chatbots might be a routine part of the clinical work of doctors. ChatGPT itself may never play that role, said James Chow, an associate professor of radiation oncology at the University of Toronto who was not involved with study. Because of the way ChatGPT works, he said, it’s impossible to fully know or control how data is used or the way the bot presents it. In his research, Chow is working to develop a medical chatbot that is more specifically trained to handle and present medical information.

Even if specialized chatbots someday act as assistants in a doctor’s office, they should never replace a human doctor, said Paul Root Wolpe, the director of the Center for Ethics at Emory University in Atlanta, who was not involved with the study.

“I think that well-tested and designed chat programs can be an aid to physicians; they should never replace physicians,” Wolpe said. Like any medical technology, Wolpe said that a clinical-trial process would be needed to determine if technology like chatbots can be used with actual patients.

One advantage of using a chatbot like ChatGPT might be a reduction in medical bias. In the study, researchers didn’t find evidence of any difference in the program’s responses relative to a patient’s age or gender, which were given in each vignette. However, Wolpe said that bias could still show up in the responses of bots in cases where data and medical research itself is biased. Some examples might be pulse oximeter readings on people with darker skin, or heart attack symptoms in women, which studies have shown are less likely to be what people think of as “typical” heart attack symptoms.

The study has several limitations, including that it didn’t use actual patient data, and only included a small number of (fictional) patients. The fact that the researchers don’t know how ChatGPT was trained is also a limitation, said Succi, and that though the results are encouraging, chatbots won’t be replacing your doctor anytime soon. “Your physician isn’t going anywhere,” he said.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

ChatGPT Makes OK Clinical Decisions—Usually

But don’t think about replacing your doctor with a chatbot now, or ever

Will Human Soldiers Ever Trust Their Robot Comrades?

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Related Stories

What to Do When the Ghost in the Machine Is You

ChatGPT’s New Upgrade Teases AI’s Multimodal Future

ChatGPT May Be a Better Improviser Than You

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

ChatGPT Makes OK Clinical Decisions—Usually

But don’t think about replacing your doctor with a chatbot now, or ever

Will Human Soldiers Ever Trust Their Robot Comrades?

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Related Stories

What to Do When the Ghost in the Machine Is You

ChatGPT’s New Upgrade Teases AI’s Multimodal Future

ChatGPT May Be a Better Improviser Than You