Understanding Bias in Toxic Language Detection: Annotator Beliefs and Identities
The paper "Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection" presents a comprehensive analysis of biases introduced by annotators' identities and beliefs in the domain of toxic language detection. This paper is distinctive in its approach, as it explores the subjective nature of toxicity perceptions and the implications these human biases have on automated toxic language detection systems.
The research is based on the premise that toxic language detection is inherently subjective and involves complex socio-pragmatic judgments. Earlier work in toxic language detection has often relied on simplistic annotations, ignoring variations in raters' perspectives. This paper addresses this gap by examining how annotators' demographic characteristics and beliefs influence their judgment of toxicity. The paper uses the lens of social psychology to explore the impact of factors such as racial biases, political leanings, and traditionalist values on the assessment of diverse language types, including anti-Black language, African American English (AAE), and vulgarity.
Two studies were conducted to unravel these biases. The first paper—referred to as the breadth-of-workers paper—collected toxicity ratings from a demographically diverse group of 641 participants on a curated set of 15 posts. The second, the breadth-of-posts paper, involved 173 annotators rating approximately 600 posts to simulate a typical large-scale annotation setting in toxic language research. These studies yielded significant insights into how biases manifest in toxicity ratings. For example, annotators who scored higher in conservative or traditionalist beliefs were more likely to rate AAE and vulgar language as toxic, while those with high scores in racist beliefs were less likely to label anti-Black content as toxic.
A critical observation was that political orientation and beliefs about free speech and the harm of hate speech significantly influenced perceptions of toxicity. This suggests that annotators' personal ideologies strongly correlate with their subjective assessments of what constitutes toxic language. The paper further illustrates these biases through a case paper using the PERSPECTIVEAPI, a popular commercial toxicity detection tool. The tool's predictions were found to align with ratings from annotators with specific ideologies, thereby mirroring human biases in automated systems.
The implications of these findings are profound. They call into question the current practices in data annotation and the development of toxic language detection models. Given the subjective nature of toxicity detection, the paper advocates contextualizing toxicity labels in social variables and recommends that dataset creators provide detailed documentation of annotator demographics and attitudes. Additionally, there is a call to move beyond binary classification models towards frameworks that can model the distribution of annotations and provide explanations of system predictions in a socio-politically aware context. This approach could prevent the replication of societal biases within AI systems, ensuring that they are more inclusive and representative of a diverse set of perspectives.
In conclusion, the paper makes a noteworthy contribution to the field of computational linguistic bias by highlighting the inherent subjectivity of toxic language detection and the role of annotator beliefs and identities. It sets the stage for future work to develop more equitable language technologies that account for the diverse ways individuals perceive toxicity, ultimately aspiring to create systems that empower, rather than marginalize, users from varied backgrounds.