It was the looming sense of crisis that brought them together. In late April, technologists from IBM, Intel, and Microsoft joined an intimate gathering of computer scientists and geneticists to discuss the big problem with big data: Our data storage requirements are rapidly exceeding the capacity of today’s best storage technologies: magnetic tape, disk drives, and flash memory.
The closed-door meeting in Arlington, Va., was convened to explore the potential of a new storage technology that is actually as old as life itself. The experts came together to weigh the merits of DNA data storage, which makes use of the marvelously compact and durable DNA molecules that encode genetic information inside living things. By converting digital files into biological material, warehouse-size storage facilities could theoretically be replaced by diminutive test tubes.
While this idea has been kicking around for many years, meeting attendee Victor Zhirnov says tech companies are now starting to consider DNA data storage as a real possibility. Zhirnov, director of cross-disciplinary research and special projects for the Semiconductor Research Corp. (which cosponsored the meeting), says he was encouraged by the presence of “luminaries” from industry and academia who took active part in the two-day workshop. “The question was, can we demonstrate a prototype DNA storage machine within five to seven years?” explains Zhirnov. “It is a very ambitious goal, but we concluded that it is possible.”
Here’s how DNA data storage works. First you take any digital data that would normally be stored in a binary code of 0s and 1s and translate it into the genetic code of As, Cs, Gs, and Ts that represent the chemical building blocks of DNA. Then you give that DNA code (for example, GATTACA) to a synthetic biology company, which manufactures strings of DNA to your specifications. Next you stash the test tube in cold storage and walk away. When you want to retrieve the information, you take out the test tube and use a standard DNA sequencing machine to decode the material inside. That gives you once again the DNA sequence GATTACA, which you can translate back into binary to read your original file.
Digits to DNA and Back Again
DNA is the densest storage medium in existence, able to store almost a zettabyte of data in a single gram of material. It’s also extremely long lasting, as demonstrated by remarkable feats of paleontological derring-do. In 2013, for example, a team reconstructed the entire genome of an early horse species using DNA from a bone that was buried in the Arctic permafrost for some 700,000 years.
If DNA archives become a plausible method of data storage, it will be thanks to rapid advances in genetic technologies. The sequencing machines that “read out” DNA code have already become exponentially faster and cheaper; the National Institutes of Health shows costs for sequencing a 3-billion-letter genome plummeting from US $100 million in 2001 to a mere $1,000 today. However, DNA synthesis technologies required to “write” the code are much newer and less mature. Synthetic-biology companies like San Francisco’s Twist Bioscience have begun manufacturing DNA to customers’ specifications only in the last few years, primarily serving biotechnology companies that are tweaking the genomes of microbes to trick them into making some desirable product. Manufacturing DNA for data storage could be a profitable new market, says Twist CEO Emily Leproust.
Twist sent a representative to the April meeting, and the company is also working with Microsoft on a separate experiment in DNA storage, in which it synthesized 10 million strands of DNA to encode Microsoft’s test file. Leproust says Microsoft and the other tech companies are currently trying to determine “what kind of R&D has to be done to make a viable commercial product.” To make a product that’s competitive with magnetic tape for long-term storage, Leproust estimates that the cost of DNA synthesis must fall to 1/10,000 of today’s price. “That is hard,” she says mildly. But, she adds, her industry can take inspiration from semiconductor manufacturing, where costs have dropped far more dramatically. And just last month, an influential group of geneticists proposed an international effort to reduce the cost of DNA synthesis, suggesting that $100 million could launch the project nicely.
The U.S. Intelligence Advanced Research Projects Activity (IARPA) cosponsored the meeting and may fund a research program to create a prototype “DNA hard drive,” but Zhirnov says that hasn’t been confirmed. “If such a program can be established, the teams are ready,” he says. Research would likely focus first on the most obvious application for a DNA hard drive: using it for archival storage, in which the data remain unchanged until the entire file is retrieved for readout. However, an IARPA program could also fund researchers who have recently demonstrated that DNA can be used for random-access memory and can even be made rewritable.
Biotech consultant Rob Carlson attended the meeting, and says he expects that the intelligence agencies of several countries will fund work on DNA data storage to grapple with the onslaught of information now being gathered by surveillance technologies. “They’re scratching their heads,” he says, “and there’s nothing else in the offing that can meet their storage needs.” Carlson has written skeptically about the commercial market for synthetic DNA, yet he says that DNA data storage may be the application that makes the young industry viable. “We can imagine storing massive amounts of data in a very small volume, and we already know how to read and write it,” he says. “Now the question is, can we read and write it at high throughput and at low cost?”
This article appears in the July 2016 print issue as “Tech Companies Mull Archiving Data in DNA.”