AI Is Being Built on Dated, Flawed Motion-Capture Data

“Gold standard” data is based on certain body types...and cadavers

4 min read

colorful illustration of different people walking
iStock

Diversity of thought in industrial design is crucial: If no one thinks to design a technology for multiple body types, people can get hurt. The invention of seat belts is an oft-cited example of this phenomenon, as they were designed based on crash dummies that had traditionally male proportions, reflecting the bodies of the team members working on them.

The same phenomenon is now at work in the field of motion-capture technology. Throughout history, scientists have endeavored to understand how the human body moves. But how do we define the human body? Decades ago many studies assessed “healthy male” subjects; others used surprising models like dismembered cadavers. Even now, some modern studies used in the design of fall-detection technology rely on methods like hiring stunt actors who pretend to fall.

Over time, a variety of flawed assumptions have become codified into standards for motion-capture data that’s being used to design some AI-based technologies. These flaws mean that AI-based applications may not be as safe for people who don’t fit a preconceived “typical” body type, according to new work recently published as a preprint and set to be presented at the Conference on Human Factors in Computing Systems in May.

“We dug into these so-called gold standards being used for all kinds of studies and designs, and many of them had errors or were focused on a very particular type of body,” says Abigail Jacobs, coauthor of the study and an assistant professor at the University of Michigan’s School of Information and the Center for the Study of Complex Systems. “We want engineers to be aware of how these social aspects become coded into the technical—hidden in mathematical models that seem objective or infrastructural.”

It’s an important moment for AI-based systems, Jacobs says, as we may still have time to catch and avoid potentially dangerous assumptions from being codified into applications informed by AI.

Motion-capture systems create representations of bodies by collecting data from sensors placed on the subjects, logging how these bodies move through space. These schematics become part of the tools that researchers use, such as open-source libraries of movement data and measurement systems that are meant to provide baseline standards for how human bodies move. Developers are increasingly using these baselines to build all manner of AI-based applications: fall-detection algorithms for smartwatches and other wearables, self-driving vehicles that need to detect pedestrians, computer-generated imagery for movies and video games, manufacturing equipment that interacts safely with human workers, and more.

“Many researchers don’t have access to advanced motion-capture labs to collect data, so we’re increasingly relying on benchmarks and standards to build new tech,” Jacobs says. “But when these benchmarks don’t include representations of all bodies, especially those people who are likely to be involved in real-world use cases—like elderly people who may fall—these standards can be quite flawed.”

She hopes we can learn from past mistakes, such as cameras that didn’t accurately capture all skin tones and seat belts and airbags that didn’t protect people of all shapes and sizes in car crashes.

The Cadaver in the Machine

Jacobs and her collaborators from Cornell University, Intel, and the University of Virginia performed a systematic literature review of 278 motion-capture-related studies. In most cases, they concluded, motion-capture systems captured the motion of “those who are male, white, ‘able-bodied,’ and of unremarkable weight.”

And sometimes these white male bodies were dead. In reviewing works dating back to the 1930s and running through three historical eras of motion-capture science, the researchers studied projects that were influential in how scientists of the time understood the movement of body segments. A seminal 1955 study funded by the Air Force, for example, used overwhelmingly white, male, and slender or athletic bodies to create the optimal cockpit based on pilots’ range of motion. That study also gathered data from eight dismembered cadavers.

A full 20 years later, a study prepared for the National Highway Traffic Safety Administration used similar methods: Six dismembered male cadavers were used to inform the design of impact-protection systems in vehicles.

In most of the 278 studies reviewed, motion-capture systems captured the motion of “those who are male, white, ‘able-bodied,’ and of unremarkable weight.”

Although those studies are many decades old, these assumptions became baked in over time. Jacobs and her colleagues found many examples of these outdated inferences being passed down to later studies and ultimately still influencing modern motion-capture studies.

“If you look at technical documents of a modern system in production, they’ll explain the ‘traditional baseline standards’ they’re using,” Jacobs says. “By digging through that, you quickly start hopping through time: OK, that’s based on this prior study, which is based on this one, which is based on this one, and eventually we’re back to the Air Force study designing cockpits with frozen cadavers.”

The components that underpin technological best practices are “man-made—intentional emphasis on man, rather than human—often preserving biases and inaccuracies from the past,” says Kasia Chmielinski, project lead of the Data Nutrition Project and a fellow at Stanford University’s Digital Civil Society Lab. “Thus historical errors often inform the ‘neutral’ basis of our present-day technological systems. This can lead to software and hardware that does not work equally for all populations, experiences, or purposes.”

These problems may hinder engineers who want to make things right, Chmielinski says. “Since many of these issues are baked into the foundational elements of the system, teams innovating today may not have quick recourse to address bias or error, even if they want to,” they say. “If you’re building an application that uses third-party sensors, and the sensors themselves have a bias in what they detect or do not detect, what is the appropriate recourse?”

Jacobs says that engineers must interrogate their sources of “ground truth” and confirm that the gold standards they measure against are, in fact, gold. Technicians must consider these social evaluations to be part of their jobs in order to design technologies for all.

“If you go in saying, ‘I know that human assumptions get built in and are often hidden or obscured,’ that will inform how you choose what’s in your dataset and how you report it in your work,” Jacobs says. “It’s sociotechnical, and technologists need that lens to be able to say: My system does what I say it does, and it doesn’t create undue harm.”

The Conversation (0)