DNA Data Storage Gets Random Access

DNA data storage just got bigger and better. Scientists have reported the first random-access storage system from which they can recover individual data files, error free, from over 200 megabytes of digital information encoded into DNA.

Random access is key for a practical DNA-based memory, but until now, researchers have been able to achieve it with only up to 0.15 megabytes of data.

Since submitting their research, published in Nature Biotechnology, the team from Microsoft Research and the University of Washington has already improved on what they reported. Their storage system now offers random access across 400 megabytes of data encoded in DNA with no bit errors, says Microsoft Research’s Karin Strauss, who led the new work with Luis Ceze from the University of Washington.

Microsoft and other tech companies are seriously considering the possibility of archiving data in DNA. Current data storage technologies are not keeping up with the breakneck pace at which we generate digital content, Strauss says. Synthetic DNA is an attractive storage medium because it can, in theory, store 10 million times as much data as magnetic tape in the same volume, and it survives for thousands of years. Technology Review reports that Microsoft Research aims to have an operational DNA-based storage system working inside a data center toward the end of this decade.

DNA data storage involves translating the binary 0s and 1s of digital data into sequences of the four bases A, C, G, and T that make up DNA. The encoded sequences are synthesized and stored in vials. A DNA sequencing machine then decodes the data by recovering the sequences from DNA molecules. But it has been hard to access specific data files. Most research efforts until now have sequenced and decoded the entire bulk of the information stored in a vial. “It is not economical to sequence all the data you have stored every time you want to read a portion of it,” Strauss says.

To make a random access system, Strauss, Ceze, and their colleagues devised clever coding algorithms and turned to the polymerase chain reaction, a well-known lab technique used to make thousands of copies of DNA strands, called amplifying DNA.

The researchers worked with 35 files ranging in size from 29 kilobytes to over 44 MB, which they have stored in DNA before. They encoded each file into a large number of 150-base-long DNA snippets. They used the error-correcting Reed-Solomon code, but unlike their previous work, they used a coding scheme that converts longer strings of data bits into DNA sequences.

In the end, they had a DNA library with over 13 million unique DNA 150-base-long sequences. Each snippet starts with a coded address that shows its location in the file. And snippets that belong to the same file are flanked with the same “primer target,” a short DNA strand that is a kickoff point for the polymerase chain reaction.

“We needed to be very careful in how we design the sequences,” Strauss says. Much thinking went into inventive algorithms that craft primer targets that do not coincide with the encoded data or address sequences.

When it’s time to read the data by sequencing the DNA, the researchers use primers for the polymerase chain reaction that amplify only the DNA snippets that belong to a chosen file. All the replicated DNA is sequenced. Finally, a new decoding algorithm that the team has developed clusters together similar looking sequences and uses statistical techniques and error-correction to reconstruct the original sequences that are then decoded to get the digital data.

There is a lot more to be done, Strauss says. “We are looking at automating the process because a few parts of the process are still done by people or by expensive machines,” she says. “We want to make the system robust, automated, and cheaper.”

From Your Site Articles

The Quest for a DNA Data Drive - IEEE Spectrum ›

medical devices dna sequencing microsoft research dna data storage

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

DNA Data Storage Gets Random Access

Researchers have devised a system to recover targeted files from 200 megabytes of data encoded in DNA

Virtual Power Plants Face New Turing-Like Test

How Virtual Machines Stabilize Power Grids

Remembering Engineering Educator Lyle Feisel

Related Stories

Software Flaws Risk Patient Safety

Pressure-Relief Eye Tech Advances Toward Approval

Prototype Sensors Sniff Out Seizures Before They Occur

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

DNA Data Storage Gets Random Access

Researchers have devised a system to recover targeted files from 200 megabytes of data encoded in DNA

Virtual Power Plants Face New Turing-Like Test

How Virtual Machines Stabilize Power Grids

Remembering Engineering Educator Lyle Feisel

Related Stories

Software Flaws Risk Patient Safety

Pressure-Relief Eye Tech Advances Toward Approval

Prototype Sensors Sniff Out Seizures Before They Occur