Are AI Algorithms Playing Fairly with Age, Gender, and Skin Color?

Does an algorithm treat people of different ages, genders, and skin colors equally, even under different lighting conditions? Facebook’s AI Red Team today released a data set—called Casual Conversations—for use in answering that question. The ten terabytes of data consist of videos recorded by 3011 participants; the data set comprises approximately 15 one-minute segments per person, for more than 45,000 total minutes. The videos are tagged with age and gender as self-reported by each participant, by skin color as determined by trained annotators using a standard scale, and by lighting conditions, also as determined by annotators.

The Facebook AI Red Team’s research manager, Cristian Canton, gave me a simple example of how the dataset could be used by developers.

“Consider the Portal device,” he says. (Portal is Facebook’s $150 tabletop smart screen.) “We have a camera in it that tracks people. If I were engineer building that technology today, to make sure it is inclusive, I could take the Casual Conversations data set, run it through the tracking algorithm in the portal, and measure where it doesn’t perform well. Say, you might find that for a person of a given age, color, or gender in a low light, it doesn’t work. Then I would know that my algorithm has a deficiency for a specific sub group.”

Facebook’s researchers experimented with the dataset by testing it on the top five winners of last year’s Deepfake Detection Challenge, a competition to develop tools designed to automatically spot fraudulent media. In a research paper and blog post released today, they reported that, while all five algorithms struggled with darker skin tones, the model that performed most consistently across the dimensions of age, gender, and lighting conditions was not first place winner Selim Seferbekov, but rather the team that came in third place, NTechLab. The fourth place team, Eighteen Years Old, turned out ironically to be best at analyzing videos of subjects in the oldest cohort, above age 45.

Performing evenly across different demographics, was not part of the judging criteria for the Deep Fake Challenge, as the full Casual Conversations dataset was not yet available.

Said Canton: “If we were to redo the competition today, maybe we would consider looking for a more inclusive approach.”

The Casual Conversations dataset released this week is just the beginning of the work needed to create fairness in AI, Canton says. For one, he points out, the problem is multifaceted, and, while having this kind of data is helpful, it is not the end-all solution.

These pie charts show the frequency of the different tags for age, gender, apparent skin tone, and lighting conditions in the 45,186 videos that make up the Casual Conversations data set.Image: Facebook

And as for the dataset development itself, he says, the team is just on “the first step of a long journey. We have identified age, gender, skin tone, and light conditions, but [these videos were] all recorded in the U.S. Maybe if we record in other countries, we will find we need to consider diversity axes that we haven’t yet seen.”

The audio part of the recordings, Canton indicated, also represents untapped potential. The audio files, created by asking subjects to respond to simple conversational prompts like “What is your favorite dish,” are currently tagged only for age and gender.

“We haven’t annotated the accents yet, but that is a potential avenue for future implementations. We do think there will be some interesting outcomes from the speech part of this. We want to test inclusivity of audio models.”

Canton hopes that releasing this data into the wild will elicit feedback that can be used to make the data set richer and more inclusive. “I would love to see adoption, and then for my colleagues and academics to tell us what they think. We want to be self-critical. With feedback, we can keep improving it. We hope it becomes a standard way to measure AI fairness.”

Canton also hopes this data set’s development will set a new standard. He is proud of the way this data set was created, including the fact that it was responsibly sourced. He stressed a number of times during our conversation that the 3000-plus subjects were paid for their efforts, were made fully aware about how their voice and video images are intended to be used, and can withdraw later if they change their minds about participation.

“We are trying to set the standard for what responsible AI should look like in the future,” he says, adding that the Facebook team wishes “to inspire other people recording data sets. It is important to do the right things and use the right tools.”

From Your Site Articles

The Battle for Better, Broader, More Inclusive AI - IEEE Spectrum ›

facebook deepfakes software ai machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Are AI Algorithms Playing Fairly with Age, Gender, and Skin Color?

Facebook researchers release a dataset intended to help machine learning developers test algorithms for bias

Entrepreneurship Program Expands to More Countries

Video Friday: Lobster Tail Turns Into Robotic Gripper

Are We Testing AI Intelligence the Wrong Way?

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Are AI Algorithms Playing Fairly with Age, Gender, and Skin Color?

Facebook researchers release a dataset intended to help machine learning developers test algorithms for bias

Entrepreneurship Program Expands to More Countries

Video Friday: Lobster Tail Turns Into Robotic Gripper

Are We Testing AI Intelligence the Wrong Way?

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024