Thirty-one papers simultaneously published by the ENCODE (Encyclopedia of DNA Elements) project indelibly repaint the picture of the human genome, throwing its information-processing characteristics into even sharper relief. (The whole collection—six papers in Nature, 18 in Genome Research, four in Genome Biology, and one in BioMed Central Genetics—is posted with open access at Nature Encode. The lead article, “Architecture of the human regulatory network derived from ENCODE data,” is must reading for information-systems designers.)
Just a few years ago, the prevailing wisdom said that the genome comprises 3 percent or so genes and 97 percent “junk” (with 2 or 3 percent of that junk consisting of the fossilized remains of retroviruses that infected our ancestors somewhere along the line). After a decade of painstaking analysis by more than 200 scientists, the new ENCODE data show that indeed 2.94 percent of the genome is protein-coding genes, while 80.4 percent of sequences regulate how those genes get turned on, turned off, expressed, processed, and modified.
This fundamentally changes how most biologists understand the master instruction set of life: we are, in short, 3 percent input/output and 80 percent logic. (Though perhaps a surprise to biologists, the finding will hardly astound anyone who has designed a complex interactive system.)
And though only 3 percent of the genome’s three-billion-odd base pairs of DNA code for proteins directly, 62 percent of your DNA is, at some time or another, transcribed into RNA, which has relatively recently been revealed to have myriad functions beyond directing the assembly of amino acids into proteins.
It will take years to analyze the healthcare implications of the ENCODE vision of the genome. Stanford University biologist Michael Snyder told the New York Times that, “Most of the changes that affect disease don’t lie in the genes themselves; they lie in the switches.”
The Times also quoted Massachusetts General Hospital researcher Bradley Bernstein, who said: “I don’t think anyone predicted that would be the case.” I can’t resist pointing out that some folks did predict exactly this. In 1991, for example, the editor of Nature Biotechnology—then called Bio/Technology—pointed out that even simple communications programs available then were 75 percent logic and 25 percent input/output. He added that, “[R]esearchers are announcing new disciplines whose names are redolent of bits and bytes—genomics, bio-informatics…. Maybe they will see past the ‘junk’ epithet to discover whether the logic lies within the genome, but beyond the gene.”