Machine Learning Tool Can Spot Mutations in Tumors

New method of identifying unique genetic changes in tumors could lead to more precise cancer treatments

2 min read

Photo-illustration of cancer cell lymphocyte T and DNA helix background sequencing on a computer in a lab
Photo-illustration: iStockphoto

Cancerous tumors tend to have a life of their own. They grow and evolve constantly, and so does their DNA. Exactly how tumor DNA changes is important information because it influences doctors’ treatment decisions.

There are technologies out there that can perform this type of complicated analysis. The DNA of a tumor sample is sequenced, and then a combination of computational tools and human experts analyze the data to figure out what kinds of genetic changes, or mutations, are occurring.

But none of these existing tools is completely accurate, according to a report published today in Science Translational Medicine. In an effort to remedy that, the authors of the report say they have developed a new method involving machine learning that automates the tumor DNA diagnostic process.

“It’s underappreciated how difficult it is to identify the true mutations in clinical tumor specimens,” says Samuel Angiuoli, coauthor of the report and chief information officer at Personal Genome Diagnostics in Baltimore. “Our machine learning approach improves the accuracy of that identification,” compared with existing techniques, he says. 

​With that information—the type, number, and location of mutations in a tumor—doctors can choose a therapy that is specific to the type of tumor. Some of those therapies already exist on the market. One drug, called vemurafenib, specifically treats skin cancer cells that have a mutation in a gene called BRAF. And many other mutation-specific therapies are in development. 

Of course these therapies are more likely to work if the mutations in the tumors can be correctly identified. That’s not as straightforward as it sounds. The sheer size of sequencing data makes it easy to miss small genetic changes. Plus there’s a significant amount of noise in that data. Lab prep methods and the sequencing machines themselves can introduce artifacts that look like genetic alterations. And there are decoy DNA mutations that can be present in a cell but are not important for tumor identification. These false positives are tricky to filter out.

Computerized tools help, but teams of human reviewers are often needed to ensure the results are of high quality. That puts these cancer diagnostic tools in centralized labs, and far away from patients. “Our goal is to develop a kit, including software, that can run anywhere in the world without the need for expert review,” Angiuoli says.

His company’s new tool, dubbed Cerebro, automates the job using an ensemble of algorithms called random forest classifiers. This traditional machine-learning technique works by evaluating a large set of decision trees to generate a confidence score for each candidate mutation—a way to judge whether a variant in the tumor DNA is a true positive.   

Angiuoli and his team trained Cerebro using millions of real-world and in silico mutations. They then pitted Cerebro head-to-head against several existing cancer mutation identification methods and found that the machine learning technique was more accurate in almost every circumstance. 

“The improvements [to mutation identification software] matter and have clinical implications,” says Angiuoli. That’s particularly true as more DNA-specific cancer therapies continue to become available on the market, he says.

Angiuoli’s company, called PGDx for short, spun out of Johns Hopkins University in 2010 with the aim of developing proprietary algorithms to identify alterations in cancer genomes. The company plans to take its products to the U.S. Food and Drug Administration (FDA) in the hope of receiving market approval. 

The Conversation (0)