Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment
The paper "Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment" presents a substantial contribution to the field of hate speech detection on social media platforms. The authors introduce a novel dataset, GOTHate, which seeks to address the limitations of existing benchmarks that predominantly rely on hate lexicons and fail to capture the nuanced nature of online hate speech. This dataset is significant as it encompasses a diverse range of socio-political topics and languages, including English, Hindi, and Hinglish, offering a more realistic representation of the variability in online discourse.
One of the paper's key innovations is the GOTHate dataset, which consists of around 51,000 Twitter posts annotated with four labels: Hate, Offensive, Provocative, and Neutral. This dataset is neutrally seeded, meaning the collection of posts is not biased by predefined hate lexicons, which often skew the annotation towards explicit content. Therefore, GOTHate provides a more nuanced and challenging environment for classification models, emphasizing context over explicit keyword triggers. The authors highlight that this approach reduces linguistic and syntactic biases, and offers a more representative sample of real-world hate speech.
In comparison with existing datasets such as those by Davidson et al. (2017) and Founta et al. (2018), GOTHate exhibits a lower inter-class divergence, as measured by Jensen-Shannon divergence, indicating a more intricate overlap between classes. This characteristic arguably makes it one of the more challenging datasets for classification models. The paper also explores adversarial validation and cross-dataset validation with other hate speech datasets, finding that GOTHate provides a unique contribution to understanding hate speech dynamics.
The authors also propose HEN-mBERT, a modular model enhancement to the multilingual BERT architecture, which incorporates user-centric auxiliary signals like timelines and ego networks to improve model performance. The use of a modular mixture-of-experts approach enables the model to handle the inherent variability and complexity in hate speech detection tasks more effectively. The experimental results suggest that HEN-mBERT significantly outperforms previous baselines, especially in harder-to-detect classes such as Hate and Provocative, with up to a 5% improvement in the F1 score for hate detection.
Practically, the implications of this research are profound. By developing robust methods for contextual hate speech detection, the authors highlight potential applications in automated content moderation systems, potentially curtailing the propagation of harmful content on social media. In collaboration with Wipro AI, the researchers are advancing a semi-automated content moderation system that utilizes the HEN-mBERT framework to enhance human moderators' capabilities in identifying and flagging hate speech.
Theoretical implications include a call to the community to reconsider how hate speech datasets are curated and utilized in training and evaluation. The nuanced approach reflected in GOTHate could steer new directions for research in hate speech detection, emphasizing context and user behavior over simplistic keyword identification.
Future work could explore broader applications of the proposed methods to other forms of online toxicity beyond hate speech, such as misinformation and harassment. Moreover, the integration of this work into real-world systems raises intriguing possibilities for longitudinal studies on the impact of advanced detection systems on reducing hate speech prevalence.
In conclusion, the introduction of the GOTHate dataset and HEN-mBERT model represents a notable shift towards more sophisticated and realistic hate speech detection systems. The authors' work compels further research into embedding user and context-aware signals in hate speech detection frameworks, which could have significant reverberations across both academic research and practical applications in online safety.