How Genealogy Websites Make It Easier to Catch Killers

As more people upload their DNA data to ancestry websites, finding criminals gets easier

Illustration of DNA double helixes with handcuffs attaching them.
Illustration: IEEE Spectrum; DNA: iStockphoto
Advertisement

Over the past six months a small, publicly available genealogy database has become the go-to source for solving cold-case crimes. The free online tool, called GEDmatch, is an ancestry service that allows people to submit their DNA data and search for relatives—an open-access version of AncestryDNA or 23andMe

Since April, investigators have used GEDmatch to identify victims, killers, and missing persons all over the U.S. in at least 19 cases, many of them decades old, according to authors of a report published today in Science. The authors predict that in the near future, as genetic genealogy reports gain in popularity, such tools could be used to find nearly any individual in the U.S. of European descent. 

GEDmatch holds the genetic data of only about a million people. But cold-case investigators have been exploiting the database using a genomic analysis technique called long-range familial search. The technique allows researchers to match an individual’s DNA to distant relatives, such as third cousins.  

Previous familial search techniques could match only close relatives. The ability to match third cousins greatly expands the population of people linked to any one individual. On average, a person in the U.S. has about 850 third cousins (or relatives whose genetic distances match that of a third cousin).

Chances are, one of those relatives will have used a genetic genealogy service. More than 17 million people have participated in these services—a number that has grown rapidly over the last two years. AncestryDNA and 23andMe hold most of those customers. 

A genetic match to a distant relative can fairly quickly lead investigators to the person of interest. In a highly publicized case, GEDmatch was used earlier this year to identify the “Golden State Killer,” a serial rapist and murderer who terrorized California in the 1970s and 1980s but was never caught.

DNA data from the serial killer’s crime scenes, saved all these years, was supplied to the GEDmatch database. Some segments of the killer’s genome linked to that of another person who had used GEDmatch—a third cousin, it turned out. Investigators were able to narrow it down from there using family trees, demographics, and other clues. The killer, Joseph James DeAngelo, now 72, was arrested in April and charged with 13 counts of murder with special circumstances, including rape.  

But just how likely is it that any given criminal will have relatives in a DNA-based ancestry database? How powerful is the long-range familial search technique? For their report today in Science, researchers at another genealogy service called MyHeritage in Or Yehuda, Israel, along with collaborators at Columbia University, in New York, set out to answer these questions. 

Their conclusion: If just 2 percent of a population gives DNA to an ancestry service, nearly 99 percent of that population will find a relative, a third cousin or closer, in that service’s database. So in the near future, a person who commits a violent crime is likely to have a relative in one of these consumer databases, says Yaniv Erlich, chief technology officer at MyHeritage and an author of today’s report

In developing their prediction, Erlich and his colleagues tested long-range familial searches on 1.28 million individuals, mostly of European descent, from MyHeritage’s database. By using a database that size and DNA from an individual of European descent, about 60% of searches found a third cousin or closer. (The authors focused on European descent because most MyHeritage users are of that lineage.)

Then the researchers investigated how difficult it would be to identify a suspect after finding his third cousin among the 850 or so individuals in a typical initial list. By looking at the location of the crime, and narrowing further by age range and gender, the list can be shaved down to about 16 to 17 suspects—a manageable number, the researchers found. 

Conducting a long-range familial search is moderately difficult. “You need to know what you are doing,” says Erlich. But “you don’t need a Ph.D. in genetics for that.” A greater challenge for cold-case investigators is getting access to an ancestry database. GEDmatch is the only database, to the authors’ knowledge, with the “very liberal privacy policy that allows you to see not only your results but the results of any other person,” says Erlich. GEDmatch makes it clear in its privacy policy that data will be shared with other users. 

Other ancestry services aren’t so easily accessible. Depending on state regulations, law enforcement must usually get a court order to conduct a familial search in one of those more private services. MyHeritage’s terms of service prohibit forensic research or criminal investigations without the company’s permission. 

Access to such databases are legally protected for good reason. Familial searches have been known to produce false positives. And Erlich himself has shown in previous research that it’s possible to use ancestry services to identify research subjects who participated in genetic studies. 

The genomes of people who volunteered for the 1000 Genomes Project, for example, are public. Someone could easily download the genetic information of one of the participants, upload it into GEDmatch or MyHeritage, find that person’s relatives, and potentially identify the research subject. (GEDmatch and MyHeritage both allow users to take DNA that was sequenced by another company and run it through their database). 

Erlich and his colleagues propose a solution to protect research subjects. They say providers of DNA data, such as 23andMe and AncestryDNA, add a header to customers’ files with a cryptographic signature. Then, when a user uploads that file to GEDmatch, the service can validate, using the signature, that it came from a legitimate lab. If the file doesn’t have the signature, the ancestry service should ask questions about where the data came from and what information the person is seeking, says Erlich.

Whether companies should have that kind of discretion over requests from crime investigators is less clear. “Everyone is happy that police can use this to catch a criminal, but what are the checks and balances for the policeman conducting these investigations?” says Erlich. “Are we okay that police could use this after a political demonstration to identify an individual?” 

Indeed, many of the GEDmatch cases have been initiated by non-law-enforcement entities. That includes a group of researchers who call themselves the DNA Doe Project, whose mission is to use genetic genealogy to identify John and Jane Does. And Parabon Nanolabs, a Virginia-based forensic DNA company, announced that it has set up a division that will use long-range familial searches. The company in May told BuzzFeed it had uploaded about 100 cases into GEDmatch

Today’s report in Science includes a fascinating list of 13 cases that have been resolved using GEDmatch. Erlich has tracked an additional six cases that have been solved in the last month. 

The Human OS Newsletter

Biomedical engineering in a biweekly newsletter. Expert insights into wearable sensors, big data analytics, and implanted devices for personalized medicine.

About the Human OS blog

IEEE Spectrum’s biomedical engineering blog, featuring the wearable sensors, big data analytics, and implanted devices that enable new ventures in personalized medicine.