HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter (2411.15462v1)

Published 23 Nov 2024 in cs.CL

Abstract: To tackle the global challenge of online hate speech, a large body of research has developed detection models to flag hate speech in the sea of online content. Yet, due to systematic biases in evaluation datasets, detection performance in real-world settings remains unclear, let alone across geographies. To address this issue, we introduce HateDay, the first global hate speech dataset representative of social media settings, randomly sampled from all tweets posted on September 21, 2022 for eight languages and four English-speaking countries. Using HateDay, we show how the prevalence and composition of hate speech varies across languages and countries. We also find that evaluation on academic hate speech datasets overestimates real-world detection performance, which we find is very low, especially for non-European languages. We identify several factors explaining poor performance, including models' inability to distinguish between hate and offensive speech, and the misalignment between academic target focus and real-world target prevalence. We finally argue that such low performance renders hate speech moderation with public detection models unfeasible, even in a human-in-the-loop setting which we find is prohibitively costly. Overall, we emphasize the need to evaluate future detection models from academia and platforms in real-world settings to address this global challenge.

Summary

The paper presents a globally representative dataset of 240,000 tweets capturing hate speech across eight languages, enhancing real-world detection analysis.
The paper demonstrates that models perform significantly worse on the realistic HateDay dataset compared to traditional academic benchmarks, particularly for non-European languages.
The paper highlights the urgent need for context-aware moderation strategies, as current automated systems struggle to reliably distinguish hate speech from offensive language.

Insights from a Global Hate Speech Dataset: An Analysis of HateDay

The paper "HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter" presents a comprehensive analysis of hate speech detection capabilities across various languages and countries using a dataset reflective of real-world social media scenarios. This dataset, comprising 240,000 annotated tweets sampled from Twitter on September 21, 2022, includes representations from eight languages and four countries, designed to address significant gaps in current hate speech detection methodologies.

Key Objectives and Methodology

The primary objective of this research is to evaluate the performance of hate speech detection models in real-world settings, addressing the biases present in existing academic datasets. The HateDay dataset is curated to present a realistic representation of global social media conversations by encompassing a wide range of languages (Arabic, English, French, German, Indonesian, Portuguese, Spanish, and Turkish) and differentiating between countries where English is predominant (United States, India, Nigeria, and Kenya).

Annotation efforts were meticulously executed by a diverse team, with the task being structured to abide by predefined guidelines. This rigorous annotation ensures the dataset’s robustness, particularly in identifying hate speech, offensive content, and neutral statements across various cultural and linguistic contexts.

Findings on Detection Performance

The paper reports stark disparities in detection performance between HateDay and traditional academic datasets, which tend to overestimate models' capabilities. Notably, the average precision of detection models drops significantly when evaluated on HateDay compared to academic datasets and functional tests. The models, including supervised and zero-shot learning approaches, exhibit performance deficits particularly for non-European languages. This establishes that existing detection models cannot reliably moderate hate speech effectively without extensive human intervention, which is shown to be financially and logistically impractical.

A significant observation is the challenge models face in distinguishing hate speech from offensive language. This is compounded by the fact that offensive content is more prevalent and often mistaken for hate speech due to lexical similarities. Furthermore, there is a notable mismatch between the focus of academic research on certain hate targets and their real-world prevalence, suggesting a need for better alignment to improve detection performance.

Implications for Hate Speech Moderation

The implications of these findings are profound, particularly in the context of social media moderation. The paper highlights the impracticality of human-in-the-loop moderation on a large scale due to substantial costs associated with reviewing a large volume of flagged content, even if automation assists in filtering this initial content. The results suggest a pressing need for more contextually aware models and additional resources focused on underrepresented hate speech types, such as political hate, which appears prevalent in real-world scenarios but less frequently targeted in academic datasets.

Future Directions and Recommendations

The paper underscores the importance of developing detection models that are not only adept in academic settings but also in practical application across diverse linguistic and cultural landscapes. It advocates for evaluations using datasets representative of the settings in which these models will be applied. Moreover, there's a call for transparency from platforms on real-world model performance to enable more effective moderation strategies.

In conclusion, HateDay provides a critical resource for further research, offering a benchmark for hate speech detection that aligns more closely with the complexities of social media environments worldwide. The research sets a vital precedent for future work aimed at refining hate speech detection and developing robust, fair, and effective moderation strategies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ManuelTonneau/status/1861393332246626749