Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

Double Helix Jeopardy

DNA databases help solve crimes but aid and abet racial discrimination

15 min read
Double Helix Jeopardy
Illustration: David Plunkert

On 4 January 1998, police in London arrested a man, whom court records call “B,” on suspicion of burglary. The police swabbed the inside of the suspect’s cheek to collect a sample of his DNA.

In August, B was acquitted and released. But in September, B’s DNA profile was—accidentally and illegally—entered into the United Kingdom’s national DNA database. The system automatically compares newly loaded DNA profiles against unidentified samples obtained from crime scenes. The system found a match—a sample recovered from a 1997 rape and assault case. The police arrested B, and the government successfully prosecuted him for those crimes.

Is there anything wrong with such a turn of events? Privacy advocates say there is, as do people worried about racial discrimination. Among these are lawyers working with the American Civil Liberties Union (ACLU) and the Council for Responsible Genetics, in the United States, and with GeneWatch and Privacy International, in the United Kingdom. Law-enforcement officials and forensic scientists, on the other hand, say the use of such a tool is invaluable for solving crimes, not only to match evidence from a recent crime to an individual in the database but also to link some unsolved cases, showing that they share an as-yet-unknown perpetrator.

Since that 1998 incident, governments have been rapidly expanding the collection of DNA for databases, and changes in database-searching technology that target near matches are raising new concerns. As a result, civil libertarians and privacy advocates are lobbying for restrictions, while some scholars are pushing in the opposite direction, arguing that the only fair way of building a DNA database is to create a universal one—that is, to record the genetic profile of each citizen.

The information loaded into such databases reflects a feature of DNA known as short tandem repeats (STRs). DNA contains a sequence of paired bases, or nucleotides, of which there are four types. The human genome contains about 3 billion such base pairs, arranged into 23 pairs of chromosomes. A small subset of the long sequence creates the 20 000 or so human genes, most of which code for the proteins that determine a person’s biochemical makeup and physical characteristics. The rest—about 98 percent—is noncoding DNA. Although scientists are discovering that a surprisingly high fraction of these seemingly useless sequences may affect the body’s functions, some of them seem clearly to be meaningless artifacts of evolution.

In certain sections of the human genome, the noncoding DNA contains repeated patterns of two to five nucleotides, the number of repeats in each sequence varying by person. For forensic typing, scientists consider repeats at several loci, or positions on the genome. The number of repeats at each locus is known as an allele. People have two alleles at each locus, one from each parent, that vary in length depending on the number of repeats.

In the United States, the Combined DNA Index System (CODIS), established by the FBI in 1990 to link existing local, state, and federal systems, is based on STRs at 13 loci. In London, the Home Office currently relies on STRs at 10 loci. Although the estimated rarity is different for each DNA profile, the estimated rarities of complete profiles can be smaller than one in a trillion.

To gather DNA for forensic databases, a law-enforcement official typically swabs inside the cheek of a suspect or criminal to obtain a sample of cells. Although scientists can extract DNA from hair, semen, or blood, a cheek swab is the most efficient and least invasive way to collect a large sample of DNA. The swab goes to a laboratory, where a technician or a robotic instrument isolates the DNA from the other cellular components.

“An ‘arrest-only’ database would have the look and feel of a universal DNA data-base for black males”

The extracted DNA goes through a second process: polymerase chain reaction, or PCR, a standard method of creating many additional copies of a selected segment of DNA. In this case, the PCR step targets all the relevant sites (10 in the UK, 13 in the United States). A genetic analyzer then separates the resulting 10 or 13 DNA fragments and measures the number of repeats in each. The numbers, one or two for each sequence, typically range from five to 20. There is just one number in some cases because a person can inherit the same number of repeats from both parents.

The DNA databases store those numbers, along with the sex of the individual. In the United States, the federal database alone contains more than 4.6 million such records. The UK’s, which started in 1995 as the world’s first national DNA database, has about the same number, drawn from a population one-fifth the size.

In the British rape and assault case , B demanded that the court exclude the DNA evidence from his trial because the police had added it into the database illegally. The trial judge agreed. The government appealed, but the Court of Appeal backed the trial judge, noting that Parliament, in establishing the national database, had created rules restricting the database to those convicted of certain crimes. Had Parliament wished to do otherwise, the appeals court argued, it could have done so. Parliament took the ruling as a call to action and in 2001 passed the Criminal Justice and Police Act, allowing law-enforcement agencies to retain DNA samples of individuals charged with a crime but not subsequently convicted.

The United States is now following the UK example. Today, FBI agents cannot legally store data from suspects who were not convicted or from individuals who volunteer their DNA samples for an investigation but are not suspects. But state officials can. Today, four states—Louisiana, Minnesota, Texas, and Virginia—mandate arrestee sampling. California voters in 2004 passed a ballot proposition that will establish by 2009 what should be the largest such database in the United States. New York Governor Eliot Spitzer has proposed including in the state database those convicted of all felonies and misdemeanors. In addition, a bill being considered in South Carolina would mandate the most aggressive arrestee-sampling program in the nation, demanding samples from those arrested for even the pettiest misdemeanors, such as shoplifting.

Some states, including California, Florida, Illinois, Missouri, and New York, though they don’t mandate arrestee sampling, already retain data that may not be added to CODIS, such as samples voluntarily given by someone to eliminate himself as a suspect. The legality of such state databases is “a cloudy area,” according to law professor David Kaye of Arizona State University in Tempe. Stephen Saloom, policy director of the Innocence Project, an organization in New York City that assists prisoners who could be exonerated through DNA testing, has called them “rogue databases.”

Meanwhile, Virginia is experiencing an echo of the B case. Members of the state crime laboratory early this year reported that they had matched a crime-scene DNA sample to stored profiles of DNA from individuals who were arrested but not convicted. Because Virginia mandates that the DNA records be expunged if the suspect is not convicted, the samples were in the database illegally. The state legislature is now considering a bill that would facilitate that record clearing but also allow matches to illegally retained samples to be used in court if they were kept in ”good faith.”

The UK case and subsequent passage of legislation in other countries illustrate the central paradox of DNA databases: inclusiveness. The more samples in a database, the more useful it potentially is at solving and preventing crimes. If the law requires a criminal conviction to allow officials to record a DNA profile, then crimes such as the rape that B carried out in 1997 go unsolved, and B perhaps goes on to commit other rapes.

The problem with inclusiveness is that there is no obvious end to it. Because people arrested for one offense have a higher-than-average probability of having committed other crimes, the inclusion of samples from all those arrested but not convicted has a crime-fighting utility. But then again, so does the inclusion of a sample of the victim, who could also be the perpetrator of another crime. And, for that matter, why wait until B acquires a burglary arrest to include his DNA sample? If it were loaded into a database at birth, he would have immediately been identified as having committed the 1997 rape.

There is no limit to the theoretical utility of adding anyone’s DNA profile to a database. Presumably, though, at some point the utility of inclusion no longer outweighs a free society’s interest in privacy. But where is that point?

When law-enforcement agencies first developed DNA databases, most country and state statutes that dealt with DNA testing mandated it for specified categories of crimes, typically murder and rape. DNA is particularly useful in solving sexual assaults, because investigators often recover semen as evidence.

As public awareness of DNA databases grew, so did the scope of the databases. Politicians could appear tough on crime by extending DNA sampling to an ever-growing array of offenses. Many such moves, however, were merely statutory; politicians did not allocate funding to enable police to do the sampling and analysis. Law-enforcement agencies, sensibly, continued to focus on the most violent offenders and did not take DNA samples from pickpockets even when the law allowed it.

Recently, however, the inexorable expansion of DNA databases has gone beyond individuals convicted of petty crimes and reached people arrested but never convicted. Meanwhile, the U.S. Justice Department is now authorized to take DNA samples from anyone detained by federal agents—which means, principally, those suspected of immigration violations.

Unlike the laws expanding the reach of DNA databases to those convicted of petty crimes, the new laws extending inclusion to arrestees not only allow such sampling, they mandate it. In California at least, that means maintenance of the arrestee DNA database may divert resources from other important tasks. In particular, many law-enforcement agencies still have backlogs of semen samples recovered from rape victims that have not been subjected to DNA testing.

A 2005 U.S. Bureau of Justice Statistics report estimated that it would take 1900 additional workers and US $70 million to reduce the forensic evidence backlog to a manageable size. And in a February 2007 interview with The New York Times, Robert Fram, chief of the FBI Scientific Analysis Section, decried the mandating of new populations to sample without any increase in resources and noted that the FBI has a backlog of 150 000 samples.

Arrestee sampling can’t possibly be a better use of resources than clearing that backlog. Most likely, such wholesale sampling would also divert money from other pressing needs, such as crime prevention and drug treatment.

Privacy advocates have other reasons for fighting against the inclusion of arrestees in DNA databases. Tania Simoncelli and Barry Steinhardt, both of the ACLU, have been particularly vocal on the subject. In the Journal of Law, Medicine, and Ethics, Simoncelli argued that “the very existence of DNA databases turns the presumption of innocence on its head,” because those included in the database are treated as potential suspects every time a new crime is investigated.

Of course, governments have long maintained databases containing the fingerprints of convicts, arrestees, and various individuals who are not criminals, including teachers and immigrants. The law considers such databases acceptable intrusions into personal liberty. But civil libertarians say that DNA samples, unlike fingerprints, include personally sensitive information to which the state should not have access: people’s ancestry, disease propensity, and perhaps even behavioral characteristics. They argue that such information could be abused by the state, by employers, or by insurance companies.

Scientists can identify weak correlations between fingerprint-pattern types and ethnicity. But people are generally more anxious about disclosing their genetic information than their fingerprints—a concern that typically generates a strong emotional response against broadly inclusive DNA databases.

Sociologist Amitai Etzioni and others who tend to value the interests of the community over those of the individual argue that broad inclusion might be a good thing. “Collecting the DNA of convicted, nonviolent felons,” Etzioni says, “may still be justified, because they have significantly lowered rights.” In his contribution to the essay collection DNA and the Criminal Justice System, Etzioni went further and argued that even “suspects have diminished rights, including much lower rights to privacy,” and therefore he sees “no obvious reason why suspects should not be tested and their DNA included in databases.”

Advocates for DNA databases also contend that because the DNA used for standard forensic profiling is noncoding DNA, the concern about genetic privacy is not an issue. But noncoding DNA may correlate with disease propensity, even if it does not cause disease, potentially allowing “tracking” of genetic diseases. But however useful such information might be, what an insurer would really want would be not just a profile, but a complete biological sample—the original cheek swab.

And there’s the rub. In all U.S. jurisdictions except Wisconsin, law-enforcement officials typically retain the samples themselves. Therefore, all the genetic information of those who are being tracked in the DNA database remains accessible to the state. There are a number of state statutes forbidding such uses of genetic information, but such laws will not necessarily remain in place.

The government could destroy the sample and record only the numeric values of its DNA profile. And that procedure could become the compromise struck between the desire for privacy and the need for crime control. But as yet, data-banking proponents are holding out against it, because such a compromise assumes that the DNA-database technology is mature. If forensic scientists develop a new scheme for DNA matching, they’ll need original samples to re-encode the existing database population. To be sure, the current systems are powerful, robust, and widely accepted, and the existence of today’s large databases is a powerful deterrent to changing the protocols. Nonetheless, the technology has advanced so rapidly during the past two decades that it would be naive to think that the existing systems represent an eternal standard.

Discrimination is another powerful argument against arrestee databases. Even convict-only databases risk being discriminatory. In the United States, courts convict some racial minorities at much higher rates than their proportion of the overall population. Criminologists are divided as to what extent the overrepresentation arises from discrimination in policing and in the courts, as opposed to a higher rate of offending, at least in the case of violent crimes. But when it comes to drug crimes, which constitute a large portion of the criminal caseload in the United States, discrimination is undisputed. And one wouldn’t want the injustice to extend to inclusion in a convict DNA database (although the harm seems far less than the damage that is done in the first place by discrimination in the criminal convictions).

When it comes to arrestee databases, however, the issue becomes more salient. Criminologists agree that racial discrimination is greater at the level of arrest than it is at the level of conviction, because arrest depends so heavily on police discretion. Arrest discrimination is not based merely on race but also on class and geography. For example, you can use, or even sell, narcotics with a far lower risk of arrest if you are rich, white, and live in the suburbs than if you are poor, black, and live in the inner city. Some demographic sectors of American society, such as poor, black, inner-city males, have shockingly low probabilities of getting through adolescence without having at least one run-in with the police. If such encounters trigger inclusion in a DNA database, the database becomes discriminatory.

To glimpse the likely outcome in the United States, look at the United Kingdom, where the database covers a much larger portion of the overall population than in the U.S. There, 37 percent of black men, 13 percent of Asian men, and 9 percent of white men have had their DNA profiles included in the national database. The figures are even starker if one considers only younger males. Approximately 77 percent of black males between 15 and 34 are in the national database, compared with 22 percent of white males in that age bracket.

Such an arrestee database tends to include the maximum number of racial minorities and the smallest number of whites [see charts, "Color Wheels "].

As Kaye and the University of Wisconsin’s Michael Smith put it starkly in their contribution to DNA and the Criminal Justice System, “Such an ‘arrest-only’ database would have the look and feel of a universal DNA database for black males, whose already jaundiced view of law enforcement’s legitimacy is itself a threat to public safety.”

When law-enforcement officials enter new genetic records from unidentified samples recovered at crime scenes into a DNA database, the system compares them with existing profiles. Some legal scholars say that this procedure amounts to daily searches of each person in the database—no different from stopping drivers for pat-downs without warrants. Other experts maintain that because the individuals aren’t aware of the searches, no harm is done.

The risks might seem remote now, but even so, perhaps they should be borne by all citizens equally.

One risk, the possibility of false incrimination, either through DNA planting or laboratory error, is less remote. There simply isn’t good current data on the false-positive error rate for DNA profiling. But although forensic DNA-profiling technology is robust, reports of recent errors abound. And it’s not just the laboratories generally considered poor (like the police crime lab in Houston) but also those regarded as among the nation’s finest (such as the FBI’s and the Virginia State Department of Forensic Sciences) that are making mistakes. The errors, documented by Professor William Thompson, of the University of California, Irvine, and others, have led to wrongful convictions.

Planting DNA is possible as well, and it is likely to become increasingly easy and cheap to do, allowing more people to learn how. Of course, the planting of evidence is not new. But because DNA evidence commands such enormous trust and is conceived as scientific, the potential hazards of evidence tampering would be particularly pernicious. Again, perhaps the risks of such mistakes or malfeasance should be borne equally.

The newest trend in DNA-database searching exacerbates the discrimination problem. In the past, when a crime-scene sample failed to match any record in a database, investigators were stymied. Recently, however, they have begun exploring an alternative in such cases: search the DNA database again, looking for close matches. A near-match profile will not be that of the perpetrator, but it may belong to a close relative. The authorities can then investigate other family members. Crime investigators have used so-called low stringency or familial searches successfully in the UK, Canada, and the United States.

The legal issues surrounding familial searching are tricky, especially when combined with the practice of surreptitious seizure of “abandoned” DNA samples from cigarette butts, soda cans, and other discarded objects. In the notorious Bind, Torture, Kill serial-murder case in Wichita, Kan., investigators obtained DNA from a tissue sample gathered for medical purposes from the daughter of the suspect, to avoid alerting him that he was under investigation. No significant constitutional barriers to such actions exist.

But familial searching also raises policy issues. A slight brush with the law that does not result in a criminal conviction puts not only the arrested individual but also effectively the person’s entire family into the database. The individual’s diminished privacy ripples through the family.

Today, DNA-database systems routinely search newly recovered crime-scene samples against the entire existing database. So the legal system subjects individuals and their families daily to suspicionless searches. In a society in which young black males in some neighborhoods have a one-in-three probability of ending up in state custody at some time in their lives (and an even higher chance of getting an arrest record), the racial overtones of such a practice are dramatic.

Experts debate whether familial searching is reasonable. At a 2006 symposium in Boston on forensic DNA, sponsored by the American Society of Law, Medicine, and Ethics, Harvard scholars Frederick Bieber and David Lazer said that after having initially been skeptical of familial searching, they concluded from their research that the potential benefit to society in crime control outweighs privacy concerns. But an interdisciplinary team from Stanford led by law Professor Henry Greely concluded that familial searching is ethically questionable, stating that “the way that familial forensic DNA puts African-Americans under much greater investigative scrutiny may not be unconstitutional, but seems unfair and quite possibly unwise.”

With experts voicing such concerns , why have so many jurisdictions opted for arrestee databases? The answer seems both obvious and troubling: the databases are popular with voters who see them as tracking people other than themselves.

Essentially voters are willing to legislate away the privacy rights of others—especially those they stereotype as potentially dangerous, such as racial minorities, the poor, and residents of economically disadvantaged neighborhoods—but are much more protective of what they perceive to be their own constitutional guarantees. This dichotomy is reflected in the U.S. government’s 1940s decisions to reject universal fingerprint databases but allow law-enforcement agencies to maintain fingerprint records in arrestee databases.

Some scholars have decided that there is no longer any alternative than to propose what many would have previously considered unthinkable: a universal DNA database. Alec Jeffreys himself, the University of Leicester, England, geneticist who developed the earliest method of DNA profiling, has now declared that the existing UK database is racially discriminatory, and he has espoused an all-inclusive database as a solution. Jeffreys also proposed that the judicial system use DNA matches for investigative purposes only. That is, DNA would provide leads that would have to be corroborated by other evidence, and courts would never use DNA as evidence. Several American legal scholars, including Kaye, Smith, and Akhil Reed Amar, a Yale law professor, have also advocated for a universal database as the antidote to the discriminatory nature of existing arrangements. And in 2005, Portugal announced its intention to become the first country to include its entire population in its database of DNA profiles.

A universal database , on the surface, has a certain egalitarian appeal. Rather than those stigmatized by an arrest record being disproportionately burdened, all members of society who benefit from the database would bear the associated risks, including the release of sensitive personal information and repercussions from laboratory errors.

Another attractive aspect of the universal-database proposal is that it would engender a more honest appraisal of the risks of government genetic databases. Consideration of a universal database shifts the debate from being about whether other people’s privacy rights are worth protecting to being about whether everyone’s are. If voters and legislators aren’t worried about the misuse of genetic information in a state-run database, let them be the first to offer their samples to it. Such voluntary contributions are rare, although in 1999, British Prime Minister Tony Blair provided his own DNA for the UK’s database.

The egalitarianism of the universal database, however, may be a mirage. The facts that have led some scholars to embrace a universal database in the first place—such as discriminatory arrest practices—would not change with the advent of a universal database. In DNA and the Criminal Justice System, sociology professor Troy Duster of New York University writes, “If the lens of the criminal justice system is focused almost entirely on one part of the population for a certain kind of activity (drug-related, street crime), and ignores a parallel kind of crime (fraternity cocaine sales a few miles away), then even if the fraternity members’ DNA are in the data bank, they will not be subject to the same level of matching. That is, if the police are not stopping to arrest the fraternity members, it does not matter whether their DNA is in a national database, because they are not criminalized by the selective aim of the artillery of the criminal justice system.”

The police would still target racial minorities, the poor, and residents of disadvantaged neighborhoods differently, Duster argues, only the still-discriminatory police would have more powerful tools in hand.

Although the DNA-database debate will probably occupy judges, legal scholars, and legislators for some time, the most likely outcome is the least equitable—including only arrestees.

Indeed, if policy-makers were purposefully trying to find the most discriminatory system possible, an arrestee database would be the ideal choice. If an arrestee database is the least equitable solution, we are left with only two reasonable alternatives: a convict database or a universal database. The decision between those two alternatives depends on how much people trust their governments. But the merit of such a debate would be that it would not be about “other people’s” DNA but about our own.

About the Author

Simon A. Cole is an associate professor of criminology, law, and society at the University of California, Irvine.

To Probe Further

A special issue of the Journal of Law, Medicine & Ethics —Vol. 34, Issue 2, 2006—explores DNA and civil liberties in depth.

Simon A. Cole addresses this topic with ­coauthors Michael Lynch, Ruth McNally, and Kathleen Jordan in a forthcoming book, The Contentious History of DNA Fingerprinting (University of Chicago Press). Other works on the subject include DNA and the Criminal Justice System, edited by David Lazer (MIT Press, 2004); Forensic Identification and Criminal Justice, by Carole McCartney (Willan, 2006); and DNA Profiling, Science, Law, and Controversy in the American Criminal Justice System, by Jay D. Aronson (Rutgers University Press, 2007).

The American Society of Law, Medicine, and Ethics has a project on DNA fingerprinting and civil liberties and an extensive Web resource at https://www.aslme.org/dna_04.

Excellent background material on this issue in the UK is in a Wellcome Trust report by Robin Williams, Paul Johnson, and Paul Martin, Genetic Information & Crime Investigation (2004), available at https://www.dur.ac.uk/resources/sass/sociology.

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions