Cheap DNA-sequencing technologies have sparked rapid growth among biomedical companies looking to create new diagnostic tools and personalized therapeutic drugs, and even to give consumers insight into their own genomes. However, the raw data from an individual DNA sequence actually isn’t very informative on its own; as is the case with a raw trace of GPS data, making sense of an individual DNA sequence usually means situating it in the context of a map. And that’s what New York City–based SolveBio is doing: building a kind of Google Maps for genomics, by combining both public and private data sets into a standardized, quality-controlled collection. Computational biologists and bioinformaticists can then enhance their own software by programming it to tap into SolveBio’s knowledge base using an online connection.
Headquarters: New York City
Founders: David Caplan, Paul George, David Gross, Mark Kaganovich,
Funding: US $2 million
Funders: Andreessen Horowitz, SV Angel; Charlie Cheever, Max Levchin, and Nat Turner and Zach Weinberg (from Flatiron Health)
The motivation to build SolveBio was born out of cofounder and CEO Mark Kaganovich’s frustrations with the typical lab’s low-tech approach to bioinformatics. While at Stanford pursuing a Ph.D. in genomics, Kaganovich noticed that he spent most of his time cleaning and organizing external data sets, and he realized that other researchers and developers had the same problem.
Kaganovich hacked together some ideas for standardized and shared access to common data sets with his friend David Caplan. The duo brought on as cofounders Paul George and David Gross to build the infrastructure and develop the online application programming interfaces, or APIs, that programmers use to integrate SolveBio into their applications.
SolveBio’s curated data library includes hundreds of data sets for genomics, proteomics, and the clinical and scientific literature used to annotate these collections. Imported data is normalized to allow results from different sources to be combined.
“In order for medicine to advance substantially, we will need more precise, molecular descriptions of disease. This means [information about] DNA, RNA, proteins, small molecules, and other things that is digital, precise, and specific,” says Kaganovich.
Recently, SolveBio announced a seed round of US $2 million from venture-capital firms Andreessen Horowitz and SV Angel and individual investors including Charlie Cheever, Max Levchin, and Flatiron Health’s Nat Turner and Zach Weinberg.
There are several companies working in the same broad area of genomics software as SolveBio, including Bina, DNANexus, Illumina, and Seven Bridges Genomics. SolveBio differentiates itself by focusing on building a public API-accessible reference rather than providing software tools to analyze private genome sequence data.
“There is a tremendous wealth of public data that can be integrated and mined towards new discoveries in biology and medicine. However, these data sources are disparate, and it often takes a tremendous amount of work to wrangle and maintain all of the large-scale data sources you need to do the kind of biomedical informatics investigations we need to advance medicine,” says Joel Dudley, director of biomedical informatics and assistant professor of genetics and genomic sciences at the Icahn School of Medicine at Mount Sinai, in New York City.
“One challenge SolveBio will face is in integrating clinical data from electronic medical records and other sources. This type of data presents several challenges ranging from maintaining patient privacy to dealing with inherent biases in how clinical data is collected,” says Dudley.
This article originally appeared in print as “SolveBio.”