Close

Reading and Writing a Book With DNA

Researchers are storing digital information in the form of DNA, but is it practical?

4 min read
Photo: Baris Simsek/iStockphoto
Photo: Baris Simsek/iStockphoto

16 August 2012—Harvard University researchers converted a 53 000-word book into DNA and then read the DNA-encoded book using gene-sequencing technology, the researchers report this week in Science. The project is by far the largest demonstration of digital information storage in DNA and the densest consolidation of data in any medium, the authors say.

There is a clear need for improved long-term storage of massively large data, says George Church, a geneticist at Harvardʼs Wyss Institute and one of the leaders of the research. There is data that we are throwing away or donʼt collect because we canʼt afford to store it, such as video surveillance of public spaces and large research projects, he says. Someday that won’t be necessary. The question is, What will get us there first: electronic or molecular memory?

Keep Reading ↓ Show less

Stay ahead of the latest trends in technology. Become an IEEE member.

This article is for IEEE members only. Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions

Understanding the Coronavirus Is Like Reading a Sentence

And parsing its "words" and "grammar" could lead to better COVID-19 vaccines

10 min read
Horizontal
Illustration showing the structure of the SARS-CoV-2 virus particle. At the virus's core is its RNA (ribonucleic acid) genome (coils). Embedded in the viral envelope (grey) are spike proteins (red) that the virus uses to attach to and infect a host cell.
John Bavaro/Science Source

Since the beginning of 2020, we've heard an awful lot about RNA. First, an RNA coronavirus created a global pandemic and brought the world to a halt. Scientists were quick to sequence the novel coronavirus's genetic code, revealing it to be a single strand of RNA that is folded and twisted inside the virus's lipid envelope. Then, RNA vaccines set the world back in motion. The first two COVID-19 vaccines to be widely approved for emergency use, those from Pfizer-BioNTech and Moderna, contained snippets of coronavirus RNA that taught people's bodies how to mount a defense against the virus.

But there's much more we need to know about RNA. RNA is most typically single-stranded, which means it is inherently less stable than DNA, the double-stranded molecule that encodes the human genome, and it's more prone to mutations. We've seen how the coronavirus mutates and gives rise to dangerous new variants. We must therefore be ready with new vaccines and booster shots that are precisely tailored to the new threats. And we need RNA vaccines that are more stable and robust and don't require extremely low temperatures for transport and storage.

That's why it's never been more important to understand RNA's intricate structure and to master the ability to design sequences of RNA that serve our purposes. Traditionally, scientists have used techniques from computational biology to tease apart RNA's structure. But that's not the only way, or even the best way, to do it. Work at my group at Baidu Research USA and Oregon State University has shown that applying algorithms originally developed for natural language processing (NLP)—which helps computers parse human language—can vastly speed up predictions of RNA folding and the design of RNA sequences for vaccines.

Keep Reading ↓ Show less