ChatGPT Makes OK Clinical Decisions—Usually

But don’t think about replacing your doctor with a chatbot now, or ever

3 min read
A robotic head with a stethoscope, the cord of which forms a human face.
Dan Page

Could ChatGPT someday assist doctors in diagnosing patients? It might one day be possible.

In a recent study, researchers fed ChatGPT information from fictional patients found in a online medical reference manual to find out how well the chatbot could make clinical decisions such as diagnosing patients and prescribing treatments. The researchers found that ChatGPT was 72 percent accurate in its decisions, although the bot was better at some kinds of clinical tasks than others. It also showed no evidence of bias based on age or gender. Though the study was small and did not use real patient data, the findings point to the potential of chatbots to help make medical care more efficient and less biased.

“This study was looking at GPT’s performance throughout the entire clinical scenario,” said Marc Succi, the associate chair of innovation and commercialization at Mass General Brigham, a health care system in the Boston area, and the senior author of the study.

Published in the Journal of Medical Internet Research on 22 August, the study used all 36 clinical vignettes from the Merck Manual, an online medical reference manual, as patients for ChatGPT to go through the process of diagnosing and treating. Clinical vignettes are patient case studies that are used to help train health care professionals critical thinking and decision making skills while caring for patients.The researchers input the text of each vignette, then ran through the questions presented in the manual for each case. The researchers chose to exclude any questions about examining images, because ChatGPT is text-based.

“I think that well-tested and designed chat programs can be an aid to physicians; they should never replace physicians.” —Paul Root Wolpe, director of the Center for Ethics at Emory University

Researchers first directed the bot to generate a list of differential diagnosesbased on the vignette—in other words, a list of possible diagnoses that can’t be initially dismissed. The chatbot was then asked to suggest which tests should be performed, followed by a request for a final diagnosis. Finally, researchers asked ChatGPT what treatment or follow-up care the patient should receive. Some of the questions from the manual also asked ChatGPT about the medical details of each case, which weren’t necessarily relevant to recommending clinical care.

Overall, ChatGPT gave responses that were 72 percent accurate, but the accuracy varied depending on the type of clinical task. The task that the chatbot was most effective at was accurately making a final diagnosis once it was given both the initial patient information and additional diagnostic testing results, with a 77 percent success rate. Questions designated as “miscellaneous,” which asked about medical details of each case, achieved a similar accuracy at 76 percent.

However, the chatbot wasn’t as effective at completing other types of clinical tasks. It was about 69 percent effective at both recommending the correct diagnostic tests for the initial patient description and prescribing treatment and follow-up care once it made a final diagnosis. ChatGPT fared the worst when it came to differential diagnosis, with only 60 percent accuracy.

Succi said he wasn’t surprised that the chatbot struggled the most with differential diagnosis. “That’s really what medical school and residency is—it’s being able to come up with good differentials with very little presenting information,” he said.

Succi said there is still a long way to go before chatbots might be a routine part of the clinical work of doctors. ChatGPT itself may never play that role, said James Chow, an associate professor of radiation oncology at the University of Toronto who was not involved with study. Because of the way ChatGPT works, he said, it’s impossible to fully know or control how data is used or the way the bot presents it. In his research, Chow is working to develop a medical chatbot that is more specifically trained to handle and present medical information.

Even if specialized chatbots someday act as assistants in a doctor’s office, they should never replace a human doctor, said Paul Root Wolpe, the director of the Center for Ethics at Emory University in Atlanta, who was not involved with the study.

“I think that well-tested and designed chat programs can be an aid to physicians; they should never replace physicians,” Wolpe said. Like any medical technology, Wolpe said that a clinical-trial process would be needed to determine if technology like chatbots can be used with actual patients.

One advantage of using a chatbot like ChatGPT might be a reduction in medical bias. In the study, researchers didn’t find evidence of any difference in the program’s responses relative to a patient’s age or gender, which were given in each vignette. However, Wolpe said that bias could still show up in the responses of bots in cases where data and medical research itself is biased. Some examples might be pulse oximeter readings on people with darker skin, or heart attack symptoms in women, which studies have shown are less likely to be what people think of as “typical” heart attack symptoms.

The study has several limitations, including that it didn’t use actual patient data, and only included a small number of (fictional) patients. The fact that the researchers don’t know how ChatGPT was trained is also a limitation, said Succi, and that though the results are encouraging, chatbots won’t be replacing your doctor anytime soon. “Your physician isn’t going anywhere,” he said.

The Conversation (2)
Wlodzislaw Duch
Wlodzislaw Duch20 Sep, 2023
SM

Can a high school kid replace medical doctor? GPTchat has not be trained specifically on medical knowledge and data, so why to test it in medical applications? Is there any argument behind opinions like "AI should never replace medical doctors"? How many people die due to the doctor's errors each year? Should we never change it, even if AI systems will prove to be much more reliable? Never say never is a good advice.

Anjan Saha
Anjan Saha19 Sep, 2023
M

Initially ChatGPT module can be trained for particular symptoms of diseases like Cardiovascular disease,Diabetes or Psychiatric illness. The questionnaire of symptoms will be prepared for each particular diseases .

While counseling patients the ChatGPT can play the role of Counsellors and can keep records of patients

diagnosed symptoms which can help Physician in prescribing medicine through proper medical software applications and reduces diagnostic time and Doctors burden of writing prescriptions. Clinical & laboratory tests soft copy can be readily available to Doctors in

fully understanding the

the patients conditions.

The physical condition of patients face and body will be checked by doctors.