Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection (2111.07997v2)

Published 15 Nov 2021 in cs.CL and cs.HC

Abstract: The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who, why, and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator identities (who) and beliefs (why), drawing from social psychology research about hate speech, free speech, racist beliefs, political leaning, and more. We disentangle what is annotated as toxic by considering posts with three characteristics: anti-Black language, African American English (AAE) dialect, and vulgarity. Our results show strong associations between annotator identity and beliefs and their ratings of toxicity. Notably, more conservative annotators and those who scored highly on our scale for racist beliefs were less likely to rate anti-Black language as toxic, but more likely to rate AAE as toxic. We additionally present a case study illustrating how a popular toxicity detection system's ratings inherently reflect only specific beliefs and perspectives. Our findings call for contextualizing toxicity labels in social variables, which raises immense implications for toxic language annotation and detection.

PDF Abstract

Understanding Bias in Toxic Language Detection: Annotator Beliefs and Identities

The paper "Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection" presents a comprehensive analysis of biases introduced by annotators' identities and beliefs in the domain of toxic language detection. This paper is distinctive in its approach, as it explores the subjective nature of toxicity perceptions and the implications these human biases have on automated toxic language detection systems.

The research is based on the premise that toxic language detection is inherently subjective and involves complex socio-pragmatic judgments. Earlier work in toxic language detection has often relied on simplistic annotations, ignoring variations in raters' perspectives. This paper addresses this gap by examining how annotators' demographic characteristics and beliefs influence their judgment of toxicity. The paper uses the lens of social psychology to explore the impact of factors such as racial biases, political leanings, and traditionalist values on the assessment of diverse language types, including anti-Black language, African American English (AAE), and vulgarity.

Two studies were conducted to unravel these biases. The first paper—referred to as the breadth-of-workers paper—collected toxicity ratings from a demographically diverse group of 641 participants on a curated set of 15 posts. The second, the breadth-of-posts paper, involved 173 annotators rating approximately 600 posts to simulate a typical large-scale annotation setting in toxic language research. These studies yielded significant insights into how biases manifest in toxicity ratings. For example, annotators who scored higher in conservative or traditionalist beliefs were more likely to rate AAE and vulgar language as toxic, while those with high scores in racist beliefs were less likely to label anti-Black content as toxic.

A critical observation was that political orientation and beliefs about free speech and the harm of hate speech significantly influenced perceptions of toxicity. This suggests that annotators' personal ideologies strongly correlate with their subjective assessments of what constitutes toxic language. The paper further illustrates these biases through a case paper using the PERSPECTIVEAPI, a popular commercial toxicity detection tool. The tool's predictions were found to align with ratings from annotators with specific ideologies, thereby mirroring human biases in automated systems.

The implications of these findings are profound. They call into question the current practices in data annotation and the development of toxic language detection models. Given the subjective nature of toxicity detection, the paper advocates contextualizing toxicity labels in social variables and recommends that dataset creators provide detailed documentation of annotator demographics and attitudes. Additionally, there is a call to move beyond binary classification models towards frameworks that can model the distribution of annotations and provide explanations of system predictions in a socio-politically aware context. This approach could prevent the replication of societal biases within AI systems, ensuring that they are more inclusive and representative of a diverse set of perspectives.

In conclusion, the paper makes a noteworthy contribution to the field of computational linguistic bias by highlighting the inherent subjectivity of toxic language detection and the role of annotator beliefs and identities. It sets the stage for future work to develop more equitable language technologies that account for the diverse ways individuals perceive toxicity, ultimately aspiring to create systems that empower, rather than marginalize, users from varied backgrounds.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Maarten Sap (86 papers)
Swabha Swayamdipta (49 papers)
Laura Vianna (2 papers)
Xuhui Zhou (33 papers)
Yejin Choi (287 papers)
Noah A. Smith (224 papers)

Citations (238)

View on Semantic Scholar

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection (2111.07997v2)

Understanding Bias in Toxic Language Detection: Annotator Beliefs and Identities

Related Papers