Doctors Still Struggle to Make the Most of Computer-Aided Diagnosis

As a timer counted down, a team of physicians from St. Michael’s Medical Center in Newark, N.J., conferred on a medical diagnosis question. Then another. And another. With each question, the stakes at Doctor’s Dilemma, an annual competition held in May in Washington, D.C., grew higher. By the end, the team had wrestled with 45 conditions, symptoms, or treatments. They defeated 50 teams to win the 2016 Osler Cup.

The stakes are even higher for real-life diagnoses, where doctors always face time pressure. That is why researchers have tried since the 1960s to supplement doctors’ memory and decision-making skills with computer-based diagnostic aids. In 2012, for example, IBM pitted a version of its Jeopardy!-winning artificial intelligence, Watson, against questions from Doctor’s Dilemma. But Big Blue’s brainiac couldn’t replicate the overwhelming success it had against human Jeopardy! players.

The trouble is, computerized diagnosis aids do not yet measure up to the performance of human doctors, according to several recent studies. Nor can makers of such software seem to agree on a single benchmark by which to measure performance. Using reports on such software in the peer-reviewed literature, one team of researchers found wide performance variations across different diseases, as well as different usage patterns among doctors. For example, younger doctors are likelier to spend time putting more patient data into a tool and likelier to benefit from the aid. Two presentations at the 6–8 November Diagnostic Error in Medicine Conference in Hollywood, Calif., confronted the issue of how to realistically incorporate technological aids into doctor training and hectic diagnosis routines.

Another issue is figuring out how to compare different software aids. “If you look at, for example, the big progress that has occurred in speech recognition or in image classification, it's really been brought about by having really good benchmark data sets and really like having actual competitions,” says computer scientist Ole Winther at the Technical University of Denmark in Lyngby. “We don't have the same in the medical domain.”

While IBM did publish a report in 2013 on its Watson-vs-Doctor’s Dilemma test, Winther says that he has been unable to obtain the subset of questions IBM used, so he was unable to directly compare it to a diagnostic aid he and colleagues built, called FindZebra. Last year, his team estimated that both FindZebra and Watson list the correct diagnosis among their top 10 results about 60 percent of the time, which is in line with what a Spanish team reported earlier this year.

Despite the lack of a unified benchmark for computer-aided diagnostics, individual doctors, family members of misdiagnosed patients, and academic and clinical groups have built and are marketing such aids. Clients include private health insurance companies and research hospitals around the world–among them, a pair of medical facilities in North Carolina and Japan that have reported some success diagnosing patients with Watson. Yet, at a recent IBM Research event in Zurich, one of IBM’s clients, Jens-Peter Neumann of the Rhön-Klinikum hospital network in Germany, said that it is too early to estimate the potential cost savings of his team’s Watson collaboration.

In February 2016 the Rhön-Klinikum network began pilot-testing Watson against the ultimate challenge for any diagnostics aid: rare diseases. The 7,000 or so known rare diseases affect perhaps 7 percent of Europe’s population, according to Munich Re, an insurance and risk management firm. As genomic screening grows more sophisticated, insurer Munich Re predicts the discovery of over 1,000 more diseases by 2020. “Memorizing them all is just not going to happen,” says computer scientist and physician Tobias Mueller of the University Clinic Marburg in Germany, who is involved in the Rhön-Klinikum pilot.

Instead the team is structuring the natural-language medical histories of the 522 patients in the pilot into the right format for Watson, a time-consuming process that combines human and computer efforts. Watson can then compare these structured histories to the medical literature and suggest ranked diagnoses.

One issue, Mueller says, has been consistently processing medical literature from both German and English. So far, the team have opted to use a combination of medical taxonomies, such as MedDRA and ICD10, to describe symptoms and diagnoses. He also notes that sometimes the knowledge sources fed into Watson contradict each other. In other words: computerized diagnosis aids are struggling with some of the same problems humans do when sharing and comparing information. “However, this reflects the diversity of the knowledge base of Watson and is no different than having a room full of doctors with different backgrounds and different opinions. It's more a strength, than a weakness,” Mueller says.

Despite the struggles, Winther says computer-aided diagnosis will ultimately mature: “A lot of patients spend years and years juggling between [general practitioners] and the wrong specialists. That’s still a challenge where there’s room for these kinds of tools.”

This post was updated on 15 November 2016 to clarify the timing and aims of the Rhön-Klinikum pilot study.

medical technology medical diagnostics rare diseases software diagnostics

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Doctors Still Struggle to Make the Most of Computer-Aided Diagnosis

Language barriers and human interfaces slow adoption of diagnostic-aid tech

New Device Generates Power by Beaming Heat to Space

Entrepreneurship Program Expands to More Countries

Video Friday: Lobster Tail Turns Into Robotic Gripper

Related Stories

These Technologists Are Trying to Make COVID-19 Risk Assessment More of a Science

What Role Will At-Home COVID-19 Tests Play in an Increasingly Vaccinated World?

Quantum Computing Makes Inroads Towards Pharma

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Doctors Still Struggle to Make the Most of Computer-Aided Diagnosis

Language barriers and human interfaces slow adoption of diagnostic-aid tech

New Device Generates Power by Beaming Heat to Space

Entrepreneurship Program Expands to More Countries

Video Friday: Lobster Tail Turns Into Robotic Gripper

Related Stories

These Technologists Are Trying to Make COVID-19 Risk Assessment More of a Science

What Role Will At-Home COVID-19 Tests Play in an Increasingly Vaccinated World?

Quantum Computing Makes Inroads Towards Pharma