AI Hallucinates Novel Proteins

Deep network dreams could greatly speed up design of new proteins

2 min read

Charles Q. Choi is a science reporter who contributes regularly to IEEE Spectrum.

This computer rendering of multicolored ribbons that represent the shape of a twisted protein structure ithat's one of hundreds dreamed up by a machine-learning algorithm.

This twisted protein structure is one of hundreds dreamed up by a machine-learning algorithm.

Ian C. Haydon/UW Medicine Institute for Protein Design

By getting artificial intelligences to hallucinate, scientists are creating novel proteins with an unlimited array of properties, a new study finds.

Proteins, which are strings of molecules found in every cell, spontaneously fold into complex 3-D shapes that are key to nearly every biological process. However, the intricacy of the interactions between the amino acids comprising each protein makes it difficult to predict their structures, even if researchers know the sequence of amino acids that constitute a protein.

Scientists have long employed programs such as Rosetta to design new proteins with potentially novel functions, to model how they might fold, and to predict if they might behave as hoped. Increasingly, deep neural networks are also helping researchers predict protein structures.

Now, scientists have found that a deep network trained exclusively to model protein shapes can also dream up proteins with new structures. They detailed their findings in the 2 December issue of the journal Nature.

The researchers, collaborators from several U.S.-based academic institutions, experimented with trRosetta, a web-based platform for protein structure prediction powered by deep learning and Rosetta. They gave it completely random protein sequences and introduced mutations into them until trRosetta began making generalizations that yielded predictions about how the strings of amino acids would arrange themselves into stable 3D structures.

"It took just a couple days to implement a pilot version of the protocol and qualitatively check whether the produced outcomes look viable," says study lead author Ivan Anishchenko, a computational biologist at the University of Washington at Seattle. "We were delighted to see that the hallucinated proteins looked plausible by eye and were also quite diverse structurally. It is highly desirable of an automated protein generation procedure to have both these properties."

The scientists generated 2,000 new protein sequences, each 100 amino acids long. All of them were originally figments of the AI's imagination. The research team developed synthetic genes to help produce 129 of these proteins in E. coli bacteria in the lab; their initial analyses confirmed that 27 of these real-world mprotein structures appeared to be folded into shapes consistent with the hallucinated structures. Detailed analysis of three of these dreamed-up proteins using X-ray crystallography and nuclear magnetic resonance imaging further confirmed that the molecules' structures closely resembled those envisioned by the deep network.

This hallucination approach could help greatly simplify protein design, Anishchenko says. Previously, in order to create a new protein with a desired shape, scientists had to analyze related structures in nature to deduce the factors needed to create that shape. Complicating the matter was the fact that researchers needed to devise new sets of rules for each new type of fold. By using a deep network that has a grasp of the general principles underlying protein structure, they no longer need to rely on those fold-specific rules and can instead theoretically guide AIs to dream up desired shapes.

The researchers now aim to create AIs that will dream up desirable proteins, such as small proteins that can bind targets of interest. Anishchenko does caution that trRosetta is already outdated. "Better options like RoseTTAFold and AlphaFold are available these days," he says. Still, "the hallucination approach can readily be extended to use these newer networks, likely resulting in even more accurate design models."

The Conversation (0)