Artificial intelligence-based health technologies are rapidly moving from the lab—where AI has routinely out-performed doctors—into the hands of consumers.
Publicly available skin cancer detection apps, such as SkinVision, use AI-based analysis to determine if a new or changing mole is a source of concern or nothing to worry about. Yet according to a new analysis of the scientific evidence behind those apps, there’s a lot to worry about.
In a study published this week in The BMJ, a team of experts evaluated the science behind six skin cancer detection apps and found it sorely lacking. The apps miss melanomas, the most serious form of skin cancer; produce false positives that could lead to removing harmless moles unnecessarily; are poorly regulated; and users are not informed of the apps’ limitations.
While most such apps include disclaimers, such as SkinVision’s fine print that their service “is not a diagnosis, and is not a substitute for visits to a healthcare professional,” there is no mention that 1 in 10 melanomas will be missed.
“App users are not being told about these error rates,” says study author Jonathan Deeks, a biostatistician with the Institute of Applied Health Research at the University of Birmingham. “I personally won’t use them.”
Getting skin cancer diagnosed early is critical to survival, so the idea of instantly assessing one’s skin cancer risk with the snap of a picture is highly appealing, especially in situations where people have limited access to a doctor.
The current paper is a follow-up to a 2018 Cochrane Review of skin cancer apps, in which Deeks and collaborators found that the apps had a high rate of missing melanomas. But the field was moving so quickly, the team decided it was time to take another look.
Through a literature search and investigating company websites, the researchers gathered nine studies that evaluated six AI-based smartphone apps. Currently, only one of those apps, SkinVision, is still commercially available; the other five have been withdrawn from the market or are no longer accessible.
The team extracted data from each study and re-analyzed it to determine how accurate and reliable each app was. Overall, study quality was poor, says Deeks, and the studies were not conducted in a way that represents how the apps are used by consumers in real life.
First, the studies were small. One app, for example, was evaluated with just a single study of 15 images. Those 15 images included 5 melanomas. The app did not identify any of the 5 melanomas.
Second, the studies relied on images taken by experts with nice phone cameras and lots of time. This stands in stark contrast to app users who may be snapping a picture of a mole on their back with a dirty smartphone lens or in the reflection of a mirror. Plus, if an image didn’t return a risk assessment, sometimes the image was simply excluded from the study.
Finally, many of the studies relied only on images from people already being investigated for cancer, rather than a wide spectrum of cases. The studies that did include images of unconcerning moles did not follow up to see which, if any, developed into cancer.
SkinVision, the best-studied of the six apps, has been evaluated in three studies, two of which were re-analyzed in this paper. Based on the data in those papers, the Birmingham researchers concluded that in a hypothetical population of 1,000 adults where 3 percent have melanoma, the app would not pick up 4 of 30 melanomas and more than 200 people would receive false-positive results. However, because of limitations in the studies, it’s possible many more errors could be made, says Deeks.
“The worst thing would be if somebody had a melanoma which [an app] said was fine, and then they didn’t go to the doctor when they would have before,” he said. “That’s doing harm.”
None of the apps are currently approved by the U.S. Food and Drug Administration (FDA), which published strict guidelines for mobile medical applications in 2013 and updated those guidelines in 2015 and 2019. In Europe, however, SkinVision and SkinScan (which was not evaluated in this paper) are approved as class 1 medical devices, a class of designated low-risk devices including reading glasses.
“Collectively as a society we must decide what amounts to good evidence when evaluating health apps,” wrote Ben Goldacre and colleagues at the University of Oxford in an accompanying editorial to the paper. “Without better information patients, clinicians, and other stakeholders cannot be assured of an app’s efficacy, and safety.”
Megan is an award-winning freelance journalist based in Boston, Massachusetts, specializing in the life sciences and biotechnology. She was previously a health columnist for the Boston Globe and has contributed to Newsweek, Scientific American, and Nature, among others. She is the co-author of a college biology textbook, “Biology Now,” published by W.W. Norton. Megan received an M.S. from the Graduate Program in Science Writing at the Massachusetts Institute of Technology, a B.A. at Boston College, and worked as an educator at the Museum of Science, Boston.