Hate Personified: Investigating the role of LLMs in content moderation (2410.02657v1)

Published 3 Oct 2024 in cs.CL and cs.CY

Abstract: For subjective tasks such as hate detection, where people perceive hate differently, the LLM's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we comprehensively analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected. Our findings on two LLMs, five languages, and six datasets reveal that mimicking persona-based attributes leads to annotation variability. Meanwhile, incorporating geographical signals leads to better regional alignment. We also find that the LLMs are sensitive to numerical anchors, indicating the ability to leverage community-based flagging efforts and exposure to adversaries. Our work provides preliminary guidelines and highlights the nuances of applying LLMs in culturally sensitive cases.

PDF HTML Abstract

Analyzing the Role of LLMs in Content Moderation

The paper "Hate Personified: Investigating the Role of LLMs in Content Moderation" conducts a comprehensive investigation into the performance and adaptability of LLMs in handling the nuanced task of hate speech detection. The authors, Masud et al., focus on understanding the interplay between artificial intelligence and human annotations, especially concerning subjective evaluations where demographic and cultural factors influence the perception of hate speech.

Key Insights and Methodology

This research explores how LLMs align with human perspectives on hate speech across different regions and persona-based contexts. The paper uses two key LLMs, FlanT5-XXL and GPT-3.5, evaluating them across five languages and multiple datasets. The authors utilize the CREHate dataset, known for containing posts annotated for hate speech by individuals from five different countries, to quantify the variability in LLMs' outputs when primed with certain demographic or geographical contexts.

Geographical Sensitivity: By introducing geographical cues into prompts, the paper evaluates whether LLMs can reflect regional alignment. Surprisingly, results indicate that LLMs like FlanT5-XXL show improvement across all considered countries, suggesting geographic metadata could be beneficial in refining content moderation algorithms. This result underscores the significant influence of trained corpus and model architecture on LLMs' geographical sensitivity and bias.

Demographic Simulation: The authors further investigate whether LLMs can mimic demographic personas. They test traits such as gender, ethnicity, political orientation, religion, and education level by revealing demographic attributes within prompts. The results show variability in the annotation outputs depending on these traits, which dictate LLM sensitivity and bias toward different demographic groups. Particularly, GPT-3.5 shows sensitivity toward vulnerable groups such as non-binary and historically marginalized ethnicities when presented with such prompts.

Anchoring Bias: This investigation reveals that when LLMs are provided with explicit numerical contexts related to community voting percentages, the anchoring bias notably influences labeling decisions, particularly under zero-shot settings. The analysis highlights that LLMs might over-rely on numerical anchors, leading to potential label misclassifications, especially in adversarial contexts.

Implications and Future Directions

The findings emphasize the nuanced approach needed when implementing LLMs for tasks requiring cultural sensitivity and demographic awareness. Content moderation using LLMs necessitates transparency and precision, particularly when demographic or numerical cues could significantly skew their outputs.

The research suggests several notable implications:

Geographic and Cultural Alignment: Integrating geographical cues can enhance LLM alignment with regional perspectives, offering improved accuracy in content moderation systems that operate on global platforms.
Demographic Simulation Caution: While LLMs may simulate certain demographic perspectives, care should be taken not to over-rely on them as proxy annotators, particularly in demographically diverse or sensitive environments.
Numerical Cues in Prompting: Introducing qualitative assessments to counter anchoring biases by validating numerical statistics can potentially mitigate misclassifications in content moderation.

By addressing the contextual parity between humans and LLMs, the paper offers valuable insights into optimizing LLM functionalities for diverse linguistic and cultural contexts. Future research could explore integrating finer-grained demographic assessments and further transparent reporting mechanisms in LLM training processes to bolster their robustness and fairness. This paper, thereby, contributes substantially to the advancement in leveraging AI for sensitive and subjective decision-making tasks like content moderation.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Sarah Masud (18 papers)
Sahajpreet Singh (4 papers)
Viktor Hangya (11 papers)
Alexander Fraser (50 papers)
Tanmoy Chakraborty (224 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/Phy_Shro/status/1842111255194304884

YouTube

Show All Videos