Researchers Embed Malware Into DNA to Hack DNA-Sequencing Software

The DNA-as-malware hack, although difficult, points to weaknesses in bioinformatics software

3 min read
A photo shows a close-up of a researcher holding a test tube containing the DNA strand used to insert malicious code into a sequencing program.
This test tube contains the DNA strand that University of Washington researchers used to exploit a modified DNA sequencing program.
Photo: University of Washington

Researchers at the University of Washington have shown that changing a little bit of computer code in DNA sequencing software can make a computer vulnerable to malware embedded in a strand of DNA.

In a related analysis, the group evaluated the security of 13 software programs commonly used for DNA analysis, and found 11 times as many vulnerabilities as are present in other types of software.

Lee Organick, a doctoral student at the University of Washington with a background in synthetic biology, says she hopes their results raise awareness among bioinformatics researchers about the poor security of this software [PDF].

To demonstrate how a strand of DNA can be used to wield an attack, Organick and her collaborators exploited a key step in DNA sequencing, when a sequencing machine translates the four types of nucleotides in a DNA strand (represented as A, C, G, T) to binary code. For example, an A is translated as 00, and a C is translated as 01.

The general security hygiene of bioinformatics programs is very low

The University of Washington team used a two-bit encoding scheme to synthesize DNA that contained 176 base pairs (neucleotides and their complementary chemicals) that would act as a malware once translated by software used to decode and analyze DNA strands. For their attack, they targeted a software program known as FASTQ, which is commonly used to read and compress DNA sequences.

Separately, they added a fixed-size buffer capable of handling only 150 base pairs to version 4.6 of the FASTQ source code, which is written in the programming language C++. In software, buffers provide temporary memory in which a program can execute functions. But fixed-length buffers can overflow, overwriting other parts of a computer’s memory.

Their exploit triggered a buffer overflow when the FASTQ program tried to read the 176 base pairs on their strand. A portion of the code also granted the team remote control of the sequencing machine’s computer and later caused it to crash.

It’s important to note that the group created a version of the FASTQ program from open-source code with a fixed-size buffer added to it, rather than target an existing vulnerability. This means it could not be successfully launched on the version of FASTQ used in labs today.

The group is also careful to emphasize there have been no known DNA-based code injection attacks attempted in the real world, and an attacker would need specialized knowledge and expertise, as well as lab privileges, in order to execute one. They hope their demonstration serves as an early warning bell about a new type of attack that could occur someday.  

“As traditional entry points like the network become more hardened, it might be the case in the future that the easiest way into a computer is through a non-traditional input path, such as DNA,” says Karl Koscher, Organick’s collaborator.

Such an attack could theoretically spread through labs by a phenomenon called sample bleeding. Rather than pay for their own sequencing machines, many labs submit DNA to third-party companies to be sequenced in a batch. During sequencing, bits of samples are often mixed with other samples. This means attackers could potentially leak malicious DNA code into other samples through third-party sequencing services.  

Though this tactic could not be launched with much precision, it could override lab computers or disrupt operations. To show that it was possible to leak code in this way, Organick’s team sequenced their malicious strand along with seven other samples. All 176 base pairs of their exploit were found intact 30 times in one of the other samples.

One way to keep this exploit from wreaking havoc would be to improve the security of the software that could be targeted by such attacks. However, Koscher says these programs are often written in labs by research scientists who are not necessarily versed in the latest computer security practices. “When you’re trying to get your science done, you’re not thinking a whole lot about adding all these defensive tactics into your code,” he says.  

Organick and Koscher chose 13 open-source programs and combed the code for known vulnerabilities or insecurities. Some of the software operates on large, expensive DNA sequencing machines while other programs run on lab computers.

They found a high rate of insecure functions within the code, as compared to a control group of software designed for web servers, file copying, or packet processing. The most common insecure functions they detected in the DNA sequencing software were strcat, strcpy, and sprintf. Another common problem they identified was the presence of fixed-size buffers.

Overall, they found 2.005 insecure functions per 1,000 lines of code within the DNA sequencing software, compared to 0.185 insecure functions per 1,000 lines of code in the control group.  These results, they say, suggest the “general security hygiene of bioinformatics programs is very low.”

Koscher and Organick will present their research at the USENIX Security Symposium in Vancouver on 17 August.

Editor’s note: This post was updated on August 10.

The Conversation (0)