In intensive care units for newborn babies, genetic disorders are the leading cause of death, so diagnosing the problem quickly is paramount. Now, in a record-breaking 26 hours, pediatricians can scan and analyze the entire genome of a critically ill infant, thanks largely to a hardware system designed to handle the big data of genetics. In a recent study published in Genome Medicine, the unit reduced the time for the key analysis step from 15 hours to a mere 40 minutes.
Today's Standard Care
Pediatricians typically run targeted genetic tests for specific diseases if they have a good guess about what's wrong with an infant. Such tests check a few specific spots on the genome, looking for disease causing mutations. But with more than 8000 possible genetic diseases, such tests "weren't really relevant to clinical care," says pediatrician Stephen Kingsmore.
The Dragen Bio-IT Processor, from the California-based company Edico Genome, plugs into a server and can be integrated seamlessly into a hospital or research facility’s existing workflows, says Edico CEO Pieter van Rooyen. This specialized add-on, he argues, provides compute power that would otherwise require expensive racks of servers or slow connections to the cloud.
In a full genome scan, machines record the sequence of the 3.2 billion “letters” that make up a person’s DNA and look for the roughly 5 million variations that make that person unique. The plummeting cost of such scans is helping doctors find many new uses for them—and in the process causing a new conundrum. “Genetics will be the biggest big-data problem that ever existed,” says van Rooyen. Others agree: A study in PLoS Biology predicted that within a decade the computation demands of genetic data will trump those of all other domains, including both astronomical research and YouTube.
By getting from genome scan to diagnosis in 26 hours, researchers at Children’s Mercy Hospital in Kansas City, Mo., demonstrated how speedy genomic analysis can keep up with clinical demand. For each critically ill infant, sequencing machines first did the brute-force work of recording a genome’s 3.2 billion letters. Using gold-standard Illumina HiSeq machines, researchers drove down the time this step takes from 25 hours to about 18 to 21 hours.
Sources of Big Data in 2025
By 2025, the storage needs of new genomics data will far outstrip
those of any other data source, according to a study by scientists
at the University of Illinois at Urbana-Champaign and Cold Spring
Harbor Laboratory, in New York. By that year, they predict that
100 million to 2 billion human genomes will have been sequenced.
Projected annual storage in 2025
Source: “Big Data: Astronomical or Genomical?” PLoS Biology, 7 July 2015.
But lead researcher Stephen Kingsmore, the pediatrician and genomics expert who led the Children’s Mercy study, says the “remarkable gains” in speed came from the Dragen, which identified all the variations in the genome of each sick baby. Then the researchers used in-house software that searched through those variants and automatically flagged those associated with a disease matching the baby’s symptoms. In a prior study, Kingsmore’s team showed how diagnoses based on genome scans can dramatically change treatment plans: For example, one baby with liver failure received the proper surgeries and pharmaceutical treatments based on the accurate diagnosis of a rare genetic disorder and is now a healthy 2-year-old.
The Dragen system delivers speed gains thanks to both hardware architecture and software specifically designed for genomic data, says van Rooyen. The system is a board with a reconfigurable processor chip and 32 gigabytes of memory. The sequencing machines’ raw data streams onto the Dragen board without caching and is distributed to the chip’s four compute engines. These components work in parallel to put all the letters of the person’s genome in their proper order by matching them to a reference genome, which is stored in dedicated memory on the board itself in order to reduce demands on the server. The processor pushes data though a pipeline of about 140 operations at a blistering rate of 400 megabytes per second. “Everything moves at once,” van Rooyen says.
Once this ordering step is complete, the processor is completely reconfigured to tackle the step of identifying variations in the genome. (That switchover takes just 20 seconds to complete.) Then as the variants are identified, data describing them is compressed and sent back to the server. As most operations occur on the Dragen board, this system uses very little of the server’s memory and doesn’t tax its CPU, consuming about 13 to 18 percent, in Edico’s estimate.
If doctors perform full genome scans for critically ill infants, they can diagnose rare diseases they would never have thought to check for. What's more, these comprehensive scans allow doctors to rule out diseases, which pediatrician Stephen Kingsmore says can be equally valuable. "Doctors always worry: 'Did I miss something that was treatable?'" he says. If the scan doesn't find a certain disease-associated mutation, doctors needn't give just-in-case treatments.
While the Dragen’s statistics are impressive, not everyone is convinced that the plug-in processor offers unique benefits. One skeptic is Michael Schatz, an associate professor of quantitative biology at Cold Spring Harbor Laboratory, in New York, and coauthor of the paper that compared data sets from genomics, astronomy, and YouTube. He argues that specialized processors lock users into certain data formats and analysis methods. “They are very good at a few things,” he says, “but the data keep changing and methods keep improving.” Schatz thinks that doctors and researchers working in big-data genomics will be better off investing in general-purpose computer clusters, which “can easily be repurposed from one application to the next or from one data type to the next.”
One way or another, doctors must start scaling up. Van Rooyen predicts that in a few years, every infant born in the developed world will have his or her genome sequenced in the hospital. “It’s just a matter of time before clinical genomics will be with us everywhere,” he says. “It’s prudent to have the infrastructure ready.” He envisions the Dragen processor outputting its analysis directly into a patient’s electronic medical record, where actionable intelligence would be flagged for the physician.
Automating medicine to this degree, from genome sequencing to diagnosis, will be necessary if we want to make use of today’s best genetic technologies, says Kingsmore, the pediatrician. “If we’re going to scale this, it has to be on the backs of smart machines,” he says. “We want to take humans out of the equation, because we’re the bottleneck.”