Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster (2406.04106v1)

Published 6 Jun 2024 in cs.CL

Abstract: Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that structured explanations reduce moderation time by 1.34 seconds per post (7.4%), significantly enhancing efficiency.
It used a controlled experiment with 25 professional moderators and the PLEAD dataset to evaluate the impact of explanation types.
The study suggests that integrating structured explanations can bolster transparency and decision-making speed in social media moderation.

Examining the Role of Explainability in Social Media Moderation

The paper "Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster" by Agostina Calabrese et al. investigates the efficacy of explainability in supporting social media moderators, particularly in the context of identifying and moderating hate speech. This paper focuses on whether providing structured explanations can expedite the decision-making process without sacrificing accuracy.

Research Motivation and Questions

The core motivation behind this investigation lies in addressing the high volume of content that social media moderators must review, a significant bottleneck in content moderation pipelines. Historically, considerable research has delved into automating hate speech detection, but there's limited work involving real-world moderators. The authors aim to fill this gap by exploring:

Whether explanations can make moderators faster.
The impact of explanation type on decision speed.
Moderators' preferences regarding explanations.

Experimental Design

The authors employed a robust experimental design involving 25 professional moderators from a large-scale social media platform. These moderators were asked to analyze posts under three conditions: viewing the post only, viewing the post with a generic policy explanation (post+policy), and viewing the post with structured explanations highlighting specific harmful elements (post+tags).

Dataset and Error Simulation

The researchers used the PLEAD dataset, which comprises hateful and non-hateful posts annotated with user intent and structured parse trees providing span-level justifications (e.g., targets, abusive language types). Importantly, to simulate real-world inaccuracies, 10% of both hateful and non-hateful posts were intentionally misclassified or given wrong explanations.

Results

Speed Improvement

The findings indicate that structured explanations significantly reduced moderation time by 1.34 seconds per post, translating to a 7.4% improvement without impacting accuracy. Interestingly, generic explanations did not significantly affect moderation speed; moderators tended to overlook these less detailed cues.

Moderator Preferences

Post-experiment surveys corroborated the quantitative findings, with 84% of moderators showing a strong preference for structured explanations. Only a minor fraction (8%) favored generic explanations, with 12% ignoring them altogether.

Implications and Future Work

Practical Implications

The paper highlights the practical benefits of integrating structured explanations into moderation tools. These findings suggest that platforms can improve moderator efficiency by implementing systems that not only flag potential policy violations but also provide detailed reasons in an easily digestible format. Given the significant improvement in speed, platforms can potentially scale this approach to manage the overwhelming volume of content effectively.

Theoretical Implications

From a theoretical perspective, these results underscore the importance of explainability in automated decision-support systems. This paper contributes to the growing body of literature advocating for transparency in AI and offers a promising direction for developing more nuanced, context-aware explainability models.

Future Research Directions

While this research provides strong evidence for the benefits of structured explanations, future studies could extend these findings to other types of content and languages. Another avenue for exploration is the integration of multimodal content explanations, addressing the increasingly diverse nature of social media posts. Additionally, further refinement in generating these explanations, addressing the occasional inaccuracies, could enhance both speed and accuracy further.

Conclusion

In sum, the paper by Calabrese et al. presents compelling evidence that structured explanations can meaningfully enhance the efficiency of social media content moderators. This work bridges a notable gap between theory and practice in the domain of hate speech detection and moderation, providing valuable insights for both researchers and industry practitioners aiming to create safer and more efficient online environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/agostina_cal/status/1799081228743909754