Deep Learning Picks Apart DNA Data-Copying Puzzles

DNA, as a data-storage medium, is useful only when read, copied, and sent out elsewhere. The medium for conveying genetic information out of a cell’s nuclei is RNA—transcribed from DNA, which itself never leaves the cell’s nuclei. Now, using deep learning, researchers at Northwestern University, in Evanston, Ill., have untangled a complex part of the RNA transcription process: how cells know when to stop copying.

In RNA transcription, knowing when to stop is crucial. The information coded into RNA is used throughout a cell to synthesize proteins and regulate a wide range of metabolic processes. Getting the right message to its intended target requires those RNA strands to say just as much as they need to—and nothing more. If they say more or less than they need, as can be the case in a number of diseases like epilepsy or muscular dystrophy, then any number of those metabolic processes can break down or malfunction to debilitating effect.

“This is a very useful prescreening tool for investigating genetic variants in a high-throughput manner.”
—Emily Kunce Stroup, Northwestern University

Halting the RNA copying process—called polyadenylation (polyA) for the string of adenine molecules it ties onto the end of a cut-off RNA strand—involves a range of proteins whose interactions have never been fully understood.

So to help unravel polyA, researchers Zhe Ji and Emily Kunce Stroup at Northwestern University developed a machine-learning model that can locate and identify polyA sites. It works by pairing convolutional neural networks (CNNs) trained to match important sequences in the genetic code with r ecurrent neural networks (RNNs) trained to study the CNN outputs.

While previous models had taken a similar approach, using both CNNs and RNNs, these researchers then fed the CNN/RNN model’s outputs into two other deep-learning models trained to locate and identify polyA sites in the genome.

The two additional models seem to have helped. “Having those tandem outputs is the really unique thing from our work,” says Stroup. “Having the model go outwards to two separate output branches that we then combine to identify sites at high resolution is what distinguishes us from existing work.”

From their model, the researchers learned a few important aspects of what can cause polyA to go well or poorly. The CNN part of the model learned genetic patterns in DNA known to attract the proteins controlling polyA, while the RNN part of the model revealed that reliably cutting off transcription requires careful spacing between those patterns. And these researchers could make such precise conclusions because of the model’s per-nucleotide resolution. “It’s striking that our model can precisely capture this,” says Ji.

Moving forward, the team says they plan to apply their model and similar techniques to research identifying key genetic mutations that potentially cause diseases and then to develop from that a possible pipeline of more targeted therapeutic drugs. “This is a very useful prescreening tool for investigating genetic variants in a high-throughput manner,” says Stroup. “This will hopefully help whittle down the number of candidate mutations to make the process more efficient.”

Stroup says the team also plans to re-create their research in other organisms to see how RNA transcription changes between different animals. They hope, she says, to use that knowledge to help control or prevent polyA when its processes are out of control—as in the cases of epilepsy and muscular dystrophy—and causing real harm.

The researchers published their paper in the journal Nature.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Deep Learning Picks Apart DNA Data-Copying Puzzles

Genetic transcription is a data problem, and AI is on the case

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

15 Graphs That Explain the State of AI in 2024

Machine Learning Turns Up COVID Surprise

Watch Syntiant’s 1-Milliwatt Chip Play Doom

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Deep Learning Picks Apart DNA Data-Copying Puzzles

Genetic transcription is a data problem, and AI is on the case

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

15 Graphs That Explain the State of AI in 2024

Machine Learning Turns Up COVID Surprise

Watch Syntiant’s 1-Milliwatt Chip Play ​Doom​

Watch Syntiant’s 1-Milliwatt Chip Play Doom