As SARS-CoV-2 Mutates, AI Algorithms Try to Keep Pace

Machine learning model meant for understanding language searches for coronavirus strains that could evade the vaccine

4 min read
AI predicts most potent COVID-19 mutations.
Photo: iStockphoto

As new variants of the coronavirus continue to spring up like wildfires across the planet, researchers have been frantically trying to determine which new strains might outwit our brand new vaccines. 

Artificial intelligence (AI) may be able to help. In a paper published Friday in the journal Science, researchers at MIT described a machine learning algorithm that can predict which mutations pose the biggest threat to the world’s fledgling immunity.

The tool could be used to quickly narrow down which mutations are most likely to evade the immune systems of people who have been vaccinated or previously infected. Researchers can then test suspected strains in the lab and update vaccines accordingly. 

“This is a real-time companion to vaccine development,” says Bryan Bryson, a biological engineer at MIT and co-author of the paper. “What we can do with our model right now is a lot faster than what you can do in the lab.” 

The tool comes at a crucial moment in the COVID-19 pandemic. Millions of doses of vaccines developed against SARS-CoV-2 (the coronavirus that causes COVID-19) are finally rolling out to the public. Just over three percent of the U.S. population has been vaccinated.

These vaccines were designed to train our immune systems to recognize a particular strain of the coronavirus. But the more the virus mutates, the greater the chance that those already vaccinated and those who were previously infected could enjoy less immunity to the new strains. 

This harrowing process is called viral escape. Coronavirus mutations that achieve escape would then send vaccine makers scrambling to update their vaccines in a high-stakes game of catch-up.

In recent weeks, new viral variants out of the United Kingdom, South Africa, California, and other regions have begun to spread across the globe. These intractable variants seem to be more contagious than their ancestors, though gratefully not more deadly. Multiple experts have said publicly that our current vaccines should still work against the new strains.

Surely, however, there will be more mutations. Here is where Bryson and his colleagues say their algorithm could help vaccine makers keep up with the game. It would, they say, reduce the laborious experimental techniques currently used to monitor such mutations. 

“This is a tool that tells you when to investigate,” says Bonnie Berger, a computer scientist at MIT and co-author of the paper. “As new strains come along, we can flag which ones are worth investigating for escape potential.”

A number of AI-based tools aided the early development of COVID-19 vaccines. For example, AI helped researchers identify which segments of a virus’s genetic code are most likely to change, and how some mutations might affect its physical structure. MIT’s new machine learning algorithm broadens AI’s repertoire by applying it to viral escape.

The group’s model was originally developed for machine language comprehension. The algorithm is designed to look for both grammar (syntax) and meaning (semantics). Using those same two principles, the researchers creatively adapted it to perceive changes to viral genetic code.

They call their process constrained semantic change search (CSCS). As the model learns about the coronavirus genome, it begins to learn what kinds of changes to that genome could be consequential.

From that, it then generates a short list of suspicious strains to test in the lab.

To test the strains, researchers would first generate a pseudovirus carrying the suspect mutations identified by the computational model. They would then subject the pseudovirus to antibodies gathered from people previously vaccinated or infected with COVID-19. If the antibodies don’t neutralize the virus, that suggests that the new strain is capable of evading the immune system—and that updated vaccines are needed.

Then it’d be back to the algorithms to look for more suspicious variants. “It’s like a loop” between the computers and the wet labs, says Bryson. “You just kind of go back and forth and try to understand the pandemic in real time.”

The researchers trained their model on just under 1000 genetic sequences of the SARS-CoV-2 spike protein, plus another 3000 spike sequences from other types of coronaviruses, such as those that cause the common cold. (The spike is what the virus uses to enter human cells, and also what our immune systems will recognize.)

Those thousands of examples taught the model the rules governing how amino acids must be sequenced in coronaviruses. “The nice thing about language models is they can learn the rules directly from a large training set,” says Brian Hie, a PhD candidate in Berger’s group, and co-author of the paper. “That’s why we wanted to use this model in the biological setting, where we don’t know the rules of which amino acids can go together.”

As an experiment, the MIT researchers fed some of the new variants into their algorithm and found that both the UK and South African strains scored “quite high” in terms of their probability of escape. However, they did not rank as high as an escape mutant generated in laboratory experiments, says Berger.

Predicting when a high score will translate into an actual escape from a human immune system is beyond the model’s capability, says Hie.

In the long term, Hie says he hopes to keep working with the model so that it can predict future mutations in viruses that haven’t yet occurred. “That’s a moonshot kind of goal for this line of research: Vaccinating against future forms of the virus,” he says. 

Correction: A previous version of this story bore the headline “As COVID-19 Mutates...” Of course, the virus (SARS-CoV-2) is the entity that is ultimately mutating, not the disease caused by the virus (COVID-19). 

The Conversation (0)

This CAD Program Can Design New Organisms

Genetic engineers have a powerful new tool to write and edit DNA code

11 min read
A photo showing machinery in a lab

Foundries such as the Edinburgh Genome Foundry assemble fragments of synthetic DNA and send them to labs for testing in cells.

Edinburgh Genome Foundry, University of Edinburgh

In the next decade, medical science may finally advance cures for some of the most complex diseases that plague humanity. Many diseases are caused by mutations in the human genome, which can either be inherited from our parents (such as in cystic fibrosis), or acquired during life, such as most types of cancer. For some of these conditions, medical researchers have identified the exact mutations that lead to disease; but in many more, they're still seeking answers. And without understanding the cause of a problem, it's pretty tough to find a cure.

We believe that a key enabling technology in this quest is a computer-aided design (CAD) program for genome editing, which our organization is launching this week at the Genome Project-write (GP-write) conference.

With this CAD program, medical researchers will be able to quickly design hundreds of different genomes with any combination of mutations and send the genetic code to a company that manufactures strings of DNA. Those fragments of synthesized DNA can then be sent to a foundry for assembly, and finally to a lab where the designed genomes can be tested in cells. Based on how the cells grow, researchers can use the CAD program to iterate with a new batch of redesigned genomes, sharing data for collaborative efforts. Enabling fast redesign of thousands of variants can only be achieved through automation; at that scale, researchers just might identify the combinations of mutations that are causing genetic diseases. This is the first critical R&D step toward finding cures.

Keep Reading ↓ Show less