When Gamers Get Nasty

Researchers grapple with subjectivity as they develop algorithms to detect toxicity in online gaming

2 min read
A man wearing a headset is seen in a dark room playing video games
Getty Images

Online gaming is a chance for players to come together, socialize, and enjoy some friendly competition. Unfortunately, this enjoyable activity can be hindered by abusive language and toxicity, negatively impacting the gaming experience and causing psychological harm. Gendered and racial toxicity, in particular, are all too common in online gaming.

To combat this issue, various groups of researchers have been developing artificial-intelligence models that can detect toxic behavior in real time as people play. One group recently developed a new such model, which is described in a study published 23 May in IEEE Transactions on Games. While the model can detect toxicity with a fair amount of accuracy, its development demonstrates just how challenging it can be to determine what is considered toxic—a subjective matter.

“Differentiating what individuals perceive as toxic or not is a big challenge in this context when players accept such toxic language as the norm in their communities or use language that others may consider as toxic without malice within their friend group. Furthermore, these norms differ among various gaming communities,” explains Julian Frommel, an assistant professor at Utrecht University, who was involved in the study.

In earlier work, Frommel and his colleagues found that toxicity may be increasingly normalized in gaming communities. This prompted them to develop the new AI model, which could be useful for filtering out problematic language or assisting human moderators in deciding on sanctions for toxic players.

To create their model, the researchers first had participants watch videos of gamers playing Overwatch and rate on a scale how toxic the players were in the game’s voice chats. The scale included sliders from 1 (“disagree strongly”) to 100 ( ”agree strongly”) on eight characteristics of the players in the match—for example, to what extent they agreed that players were angry, offensive, or toxic.

“We were surprised at how much human raters differed in their toxicity ratings of the same clip, but in retrospect, this highlights a significant challenge in detecting toxicity with automated methods—the challenge of subjectivity,” says Frommel.

“More recent work has revealed that what one person experiences as toxic can be experienced by another person as completely benign. This has to do with our differing identities and backgrounds and previous gaming experiences,” explains Frommel. “And it makes sense that we are individually different in our sensitivities, but this creates a challenge for machine-learning models that rely on an agreed-upon ground truth.”

Using the data from the study participants, the researchers built their AI model and validated it, finding it can predict if a match was toxic with an accuracy of 86.3 percent. This is on par with similar AI models developed for the same purpose, although one research group has achieved 95.7 percent accuracy using a different type of machine-learning algorithm for detecting toxicity.

“The advantages of [our] approach include that it is quite a simple model without substantial computation costs and that it can be automated and applied as a noninvasive approach in many games that use in-game voice chats,” says Frommel.

However, he notes that many more steps are needed to address toxicity in online gaming using AI models, including confirmation of the results, improving the accuracy of prediction, tackling issues like bias, privacy, and ethics, and considering contextual factors and subjectivity.

The Conversation (0)