This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.
Hate speech on social media is increasing, deterring some people from participating while creating a toxic environment for those who remain online. Many different AI models have been developed to detect hate speech in social media posts, but it has remained challenging to develop ones that are computationally efficient and are able to account for the context of the post—that is, determine whether the post truly contains hate speech or not.
A group of researchers in the United Kingdom has developed a new AI model, called BiCapsHate, that overcomes both of these challenges. They describe it in a study published 19 January in IEEE Transactions on Computational Social Systems.
Tarique Anwar is a lecturer in the department of computer science at the University of York, who was involved in the study. He notes that arguments online can often lead to negative, hateful, and abusive comments, and that the existing content-moderation practices of social media platforms fail to control this.
“Also, the online hate speech sometimes shows its reflection in the real environment, leading to crime and violence,” he says, noting that there have been several instances where online hate speech has led to physical violence and riots.
To help address this issue, Anwar’s team developed BiCapsHate, which in several ways is unique compared with other AIs that detect hate speech. The model consists of several advanced layers of deep neural networks, each one dedicated to capturing different properties of hate speech. Significantly, it includes a deep-learning layer that translates the language of a social media post into a numerical value and evaluates this sequence both forward and backward. In this way, the AI is able to “understand” the context behind the social media post, and better determine if the post is hateful or not.
As Anwar points out, language can be ambiguous in some circumstances, whereby a word can be hateful in one context and anodyne in another. He cites some existing AIs, such as HateBERT, ToxicBERT, and fBERT, that are able to capture hateful context to some extent. “But these are still not good enough and consistent in their performance,” he emphasizes.
In their study, Anwar and his colleagues compared BiCapsHate to these other models, finding that theirs outperformed the other models significantly. BiCapsHate achieved 94 percent and 92 percent f-score measures on balanced and imbalanced data sets, respectively. An f-score is a means of evaluating the accuracy of AI systems. The higher the f-score, the greater the accuracy.
Another advantage of BiCapsHate is that the model is able to perform computations using limited hardware resources. “[The other models] require high-end hardware resources like GPU, and high-end systems for computation,” explains Anwar. “On the contrary, BiCapsHate…can be executed on a CPU machine with even 8 gigabytes of RAM.”
Notably, the AI has so far been developed and tested for analyzing hate speech only in English, so it would need to be adapted for other languages. It was also less adept at detecting offensive words with a mildly or subtly hateful tone compared with more intense examples of hate speech.
The researchers hope next to explore ways of assessing the mental health of users who express hate online. If there are concerns that the person is mentally unstable and might be physically violent toward people in the real world, early interventions may be considered to lower the chances of this happening.
This article appears in the June 2023 print issue as “Tech to Combat Social Media Hate Speech.”