Errors Found in Forensic Software Meant to Assess Age of Death of Skeletal Remains

Problem highlights increasing concerns about validity of digital forensic apps

An illustration shows skeleton parts in black and red on a white background.
Illustration: iStockphoto
Advertisement

Research conducted by biological forensic scientists at North Carolina State University (NCSU) and the University of South Florida (USF) has uncovered “serious problems” in a recently released forensic software application available online called DXAGE, which is supposed to predict the age at death of skeletal remains based on bone mineral density.

The study, published in the Journal of Forensic Sciencesreported that the software’s predicted ages could be off by 14.25 years on average when DXAGE-generated results were compared against known samples. The system’s accuracy was particularly poor for the remains of elderly individuals.

To test the accuracy of DXAGE, the scientists used the bone mineral density of 470 women who were part of the U.S. Centers for Disease Control and Prevention’s 2007–2008 National Health and Nutrition Examination Survey. The researchers used the survey data as inputs to DXAGE and compared the actual versus the predicted ages of the women. While DXAGE’s predicted age at death for women who died in their 30s was within 0.65 years on average, for women who died in their 70s, DXAGE’s predicted age at death was incorrect by 24.4 years on average.

The authors of the study, which was funded by a grant from the National Institute of Justice, say they suspect that the reason for the discrepancies lies in the small sample size of 100 women that DXAGE’s creators used to calibrate the software, along with their decision to incorporate data from decades-old skeletal remains from cemeteries where the minerals in the bones may have leached away. These factors may have led to erroneous conclusions by the software’s underlying neural network algorithm.

Ann Ross, a professor of biological sciences at NC State and one of the study’s lead authors, said mineral bone density is an important characteristic for quantifying age at death. One way to measure it is through dual X-ray absorptiometry (or DXA), which is a method of measuring bone mineral density and content based on attenuation of photons

Ross stressed that a technique that she helped develop using linear regressions of mineral bone density to determine age of death is more accurate than DXAGE. Ross also reiterated that it is important to use a sufficient sample size as well as to “use samples that have not been affected by environmental factors after death” in developing age-after-death estimating techniques.

Another issue the study highlighted was that DXAGE uses an algorithm based on a neural network. Ross said her team was unclear how DXAGE actually determined the age of death, and indicated that its “black box” approach would pose problems for anyone trying to use it in a legal proceeding. Ross also said that DXAGE does not offer clear “cautionary warnings” to make users aware of its estimation errors, how its algorithm determines age of death, or that DXAGE’s results have not yet been independently validated.

The DXAGE developers, however, take exception to many of the critiques made in the NCSU and USF study. Francisco Curate, one of the researchers and developers of DXAGE at the Research Centre for Anthropology and Health, in Coimbria, Portugal, wrote in an email that they made it clear in their published work that there were limitations to DXAGE based on its small sample size and using only women subjects of European ancestry. He also said that they warned that the system’s results were only preliminary.

Curate, however, did concede that “we should make clear in the website from DXAGE that the models are preliminary and geographically limited, also that users should read our article before using the Web app.” This has now been done.

Curate, nevertheless, argued that the NCSU and USF study had a “major limitation” itself, namely, “it compares bone mineral density values obtained in skeletal individuals (the DXAGE models) with those of living individuals (NHANES database).” Curate informed me that “all experts in the use of DXA technology in archeological and forensic settings concur that we should not compare values of living and skeletal individuals: The latter lack marrow, fat, and soft tissues, and that will impact bone mineral density readings.”

That, and the fact that the comparison used different remains from different geographical regions “probably inflated the inaccuracy” reported in the study, Curate argues. He says, “We believe that a method to estimate age in skeletal remains using DXA must use a reference sample that is also skeletonized.” Addressing the claim that using linear regression to determine age of death is better that DXAGE, Curate says that the technique “has been discouraged over and over by different researchers.”

Ross, however, disagrees with Curate’s views. She says the lack of soft tissues in skeletal remains is not a concern, and that it's appropriate to compare skeletal remains to living individuals in order to construct a deceased person's biological profile.

Ross also disagrees with the objection raised about linear regression and bone mineral density, saying that “there is a linear correlation between bone mineral density and chronological age,” and therefore it is entirely appropriate to use this technique.

There is one point in the new study that Curate and his fellow researchers do agree on—the way in which DXAGE’s software determines age to death needs to be more transparent. He says that the team is working on making this possible, as well as upgrading DXAGE with a larger sample size and different variables and statistical approaches.

The debate illustrates that forensic analysis is still an evolving science with dedicated scientists trying to discover what works and what doesn’t, and that it is nowhere near as pristine and unassailable as has been depicted in popular television crime shows. The science is much, much messier and more uncertain.

An in-depth look by the National Academy of Sciences into the state of forensic science in the United States in 2009 showed [PDF] that many “accepted” forensic techniques, such as “those used to infer the source of tool marks or bite marks have never been exposed to stringent scientific scrutiny.” The validity of many of the techniques, the report states, have not been founded on a reliable scientific methodology, or depends heavily upon the expertise and interpretation of individual forensic scientists, which can allow bias or operational error to taint the results.

Surprisingly, even the use of latent fingerprints is now under intense scrutiny as the science behind it may not be as robust as popularly believed. A 2017 American Association for the Advancement of Science study [PDF] into forensic science assessment that focused on latent fingerprint examination concluded that “latent print examiners should avoid claiming that they can associate a latent print with a single source and should particularly avoid claiming or implying that they can do so infallibly, with 100% accuracy.”

The increasing availability of software applications to perform forensic analysis could add to the problem of using forensic techniques that are not scientifically sound. According to Ross, many of the applications that are appearing online are not validated for forensic use. More concerning, they do not come with bold warnings to potential users about their limitations or that the apps should not be used for casework.

The proliferation of these apps worries Ross and many of her fellow forensic scientists. If they’re used by law enforcement officials who may not understand their limitations, they can lead to wasted effort in trying to identify skeletal remains. More important, if unverified forensic apps are used in court, they could lead to false convictions or acquittals as other incorrect forensic science techniques have done.

While the United States has imposed some legal safeguards [PDF] to stop this from happening, these protections don’t necessarily exist in other countries. And even in the United States, the safeguards are no guarantee that nonvalidated forensic apps or techniques will stay out of court proceedings, since trial judges have great discretion [PDF] in what forensic techniques are or are not admissible. The problem of what is acceptable forensic science has been a priority issue in the United Kingdom, with a parliamentary inquiry into the use of forensic science in the criminal justice system now under way.

The debate over DXAGE reflects many of the same issues raised the 2009 NAS report, namely, that “law enforcement officials and the members of society they serve need to be assured that forensic techniques are reliable. Therefore, we must limit the risk of having the reliability of certain forensic science methodologies judicially certified before the techniques have been properly studied and their accuracy verified by the forensic science community.”

As Ross points out, these disagreements are necessary, since they serve to stimulate further scientific discourse, which leads to better and validated forensic techniques that can help convict the guilty, and free the innocent.

The Computing Technology Newsletter

Biweekly newsletter about advances in hardware, software and systems.

About the Risk Factor blog

IEEE Spectrum’s risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.