The Race to Build a Search Engine for Your DNA

In 2005, next-generation sequencing began to change the field of genetics research. Obtaining a person’s entire genome became fast and relatively cheap. Databases of genetic information were growing by the terabyte, and doctors and researchers were in desperate need of a way to efficiently sift through the information for the cause of a particular disorder or for clues to how patients might respond to treatment.

Companies have sprung up over the past five years that are vying to produce the first DNA search engine. All of them have different tactics—some even have their own proprietary databases of genetic information—but most are working to link enough genetic databases so that users can quickly identify a huge variety of mutations. Most companies also craft search algorithms to supplement the genetic information with relevant biomedical literature. But as in the days of the early Web, before Google reigned supreme, a single company has yet to emerge as the clear winner.

Making a functional search engine is a classic big-data problem, says Michael Gonzalez, the vice president of bioinformatics at one such company, ViaGenetics, which was expected to relaunch its platform in March. Before doctors or researchers can use the data, genomic data must be organized so that humans can read and search it. The first step toward that is to put it in a standard form called the variant call format, or VCF. As raw data, a person’s complete sequenced genome would take up about 100 gigabytes, so a database that adds the genomes of even 10 patients per day would quickly get out of hand. But VCF files are more compact, requiring only a few hundred megabytes per genome, which helps researchers find the specific variants they want to search in a fraction of the time. Unlike a fully sequenced genome, VCF files point only to where a person’s genetic data deviates from the standard–the genome originally compiled by the Human Genome Project in 2001.

With VCF, sifting the genomes themselves for pinpoint mutations isn’t the challenge for search engine companies. Most of these companies are allocating their resources toward efforts to seamlessly compile supplementary information about a specific mutation from other databases across the Web, such as the biomedical research archive PubMed or various troves of electronic medical records. Many of these tools have finely tuned algorithms that prioritize the results by credibility or relevance. “You want to be able to pull together the information known about a mutation in that position [of the genome] and quickly make an assessment,” says David Mittelman, the chief scientific officer for Tute Genomics, based in Provo, Utah, another company designing a genetic-search engine.

In an effort to expand the information that can be attached to a genome under examination, ViaGenetics, based in Miami Beach, Fla., is making its newly updated platform useful for researchers who want to collaborate across institutions. With ViaGenetics’ tools, researchers “can make their data available to other users, so other people can come across these projects, request access, and form a collaboration,” Gonzalez says. “It helps people connect the dots between different researchers and institutions.” This is especially helpful for smaller labs that may not have very extensive genome databases or for researchers from different universities working to decode the same mutation.

Although the genomic-search industry is now focused on serving scientists, that might not always be the case. Mittelman envisions that Tute Genomics could eventually serve consumers directly. People are already demanding information about their genomes just to understand themselves better, Mittelman says, but most companies don’t yet consider the average person to be their primary customer. In order to make that shift, the tool will have to be even more intuitive and user-friendly. “Fire-hosing someone with data that’s not easy to interpret, or using terminology that’s not standardized, has the potential to confuse people,” he says. Privacy is also a major concern for the average user; the information that Tute users upload isn’t stored permanently, Mittelman says, but users will need extra reassurance if the platform becomes available to the lay public.

And a further evolution of the industry is in the offing. Both ViaGenetics and Tute are hoping to be able to run the entire process in-house—from the initial DNA sequencing to the presentation of final searchable results to users. “The market for analyzing and interpreting genomic data is very fragmented, like the computer industry in the 1990s, where you had to go to separate providers to buy a video card or a motherboard and then try to put it together,” Mittelman says. “Soon this field will consolidate, as the computer industry did.”

This article originally appeared in print as “A Google for DNA.”

About the Author

Alex Ossola is a New York City-based journalist who writes frequently for Popular Science and Motherboard. Ossola first heard about the highly competitive fight to devise the best DNA search engine in passing while at a New Year’s Eve party. It wasn’t the usual way she gets story tips, but she knew immediately that she wanted to pursue it. “When something like that falls into your lap, you pull at the thread and find something really cool at the end,” Ossola says.

medical diagnostics dna genome sequencing software genetics genomics

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

The Race to Build a Search Engine for Your DNA

Genome-search companies vie to be the Google of personalized medicine

About the Author

Linking Renewables Turn Existing Power Plants Green

Video Friday: Happy Robot Holidays

Kyocera's Optical Tech Boosts Underwater Data Speeds

Related Stories

These Technologists Are Trying to Make COVID-19 Risk Assessment More of a Science

What Role Will At-Home COVID-19 Tests Play in an Increasingly Vaccinated World?

Quantum Computing Makes Inroads Towards Pharma

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

The Race to Build a Search Engine for Your DNA

Genome-search companies vie to be the Google of personalized medicine

About the Author

Linking Renewables Turn Existing Power Plants Green

Video Friday: Happy Robot Holidays

Kyocera's Optical Tech Boosts Underwater Data Speeds

Related Stories

These Technologists Are Trying to Make COVID-19 Risk Assessment More of a Science

What Role Will At-Home COVID-19 Tests Play in an Increasingly Vaccinated World?

Quantum Computing Makes Inroads Towards Pharma