Reconstructing cavemen from bits of fossil DNA is still the stuff of science fiction. But thanks to high-powered computing wizardry, we now have the blueprints you’d need to do it. An international team of scientists published the first draft of the Neanderthal genome in the journal Science on 7 May. The study showed, somewhat surprisingly, that early humans and Neanderthals interbred and that 1 to 4 percent of the DNA in modern Asians and Europeans comes from Neanderthals.
The bulk of the credit for decoding the Neanderthal goes to high-throughput sequencing technologies developed in the past five years, which turned bits of ancient DNA into millions of short strings of letters. But sequencing the Neanderthal genome would have been impossible without the sophisticated software that put all those millions of strings together in the right order.
Both the sequencing and computing of whole genomes have come a long way in the nine years since scientists first delivered the DNA blueprints for humans. Then, sequencing machines read 330 000 base molecules per day. (DNA contains a series of four different base molecules: adenine, cytosine, guanine, and thymine, or A, C, G, and T.) The latest machines made by Illumina, in San Diego, can sequence about 2 billion bases a day, amounting to hundreds of terabytes of data to store and analyze.
Assembling the human genome was like putting together a 100-piece jigsaw puzzle without any idea of what the picture should look like. The computers that were used to sequence the 3-billion-base human genome had to sort through about a hundred DNA fragments at a time, each made up of 2000 to 3000 bases. An assembly algorithm then compared these fragments, found overlaps, and strung together longer and longer stretches of the sequence.
The difference between now and then, Neanderthal and human, is both scale and complexity. DNA fragments from fossil Neanderthal bones are very short, having degraded to an average length of only 50 bases, and there are millions of them, not hundreds. Some of these are chemically damaged in such a way that one base has mutated into another. And even worse, more than 96 percent of the DNA sequences scientists obtained came not from Neanderthals but from microbes that have contaminated the bones. To continue with the puzzle analogy, you now have millions of very small pieces, some with faded color, of which only a few thousand belong to your puzzle.
”The challenge is to find a needle in a haystack,” says Janet Kelso, a bioinformatics researcher at the Max Planck Institute for Evolutionary Anthropology, in Leipzig, Germany, where much of the sequencing and data analysis was done.