Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection (2210.04267v3)

Published 9 Oct 2022 in cs.CL and cs.AI

Abstract: Pre-training large neural LLMs, such as BERT, has led to impressive gains on many NLP tasks. Although this method has proven to be effective for many domains, it might not always provide desirable benefits. In this paper, we study the effects of hateful pre-training on low-resource hate speech classification tasks. While previous studies on the English language have emphasized its importance, we aim to augment their observations with some non-obvious insights. We evaluate different variations of tweet-based BERT models pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This evaluation is carried out for the Indian languages Hindi and Marathi. This paper is empirical evidence that hateful pre-training is not the best pre-training option for hate speech detection. We show that pre-training on non-hateful text from the target domain provides similar or better results. Further, we introduce HindTweetBERT and MahaTweetBERT, the first publicly available BERT models pre-trained on Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art performance on hate speech classification tasks. We also release hateful BERT for the two languages and a gold hate speech evaluation benchmark HateEval-Hi and HateEval-Mr consisting of manually labeled 2000 tweets each. The models and data are available at https://github.com/l3cube-pune/MarathiNLP .

Authors (5)

Omkar Gokhale (6 papers)
Aditya Kane (14 papers)
Shantanu Patankar (8 papers)
Tanmay Chavan (11 papers)
Raviraj Joshi (76 papers)

Citations (7)

View on Semantic Scholar

Summary

Analyzing Hate Speech Detection Through Pre-training Strategies in Hindi and Marathi

The research paper "Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection" provides a comprehensive analysis of the impact of different pre-training strategies on hate speech detection in low-resource languages, specifically Hindi and Marathi. The authors challenge the prevailing assumption that training LLMs on hateful text enhances hate speech detection capabilities. Through empirical analysis, they demonstrate that models pre-trained on non-hateful or random datasets can achieve similar, if not superior, performance in detecting hate speech compared to those pre-trained on hateful content.

Research Methodology and Findings

The researchers conducted extensive experiments utilizing variations of BERT models pre-trained on distinct datasets: hateful, non-hateful, and random subsets from a corpus of 40 million tweets. These investigations targeted Hindi and Marathi due to their significant presence in India and their linguistic characteristics derived from Sanskrit. The results consistently showed that large-scale pre-training on a mixed corpus provided the best performance across various downstream tasks.

In the case of Marathi, the newly introduced MahaTweetBERT model consistently outperformed alternatives, achieving accuracy metrics upwards of 89% on hate speech detection tasks. Similarly, HindTweetBERT for Hindi showcased notable performance improvements over existing benchmarks such as MuRIL and HindBERT, demonstrating the efficacy of pre-training on extensive corpora.

Key Observations

Ineffectiveness of Hateful Pre-training: The paper reveals that models pre-trained exclusively on hateful data do not necessarily provide superior results in hate speech detection. This finding challenges prior assumptions and suggests that domain adaptation to hateful content may not be as beneficial as previously thought.
Superiority of Large Corpus Pre-training: Both Hindi and Marathi models achieved the best results when pre-trained on a comprehensive corpus encompassing diverse tweet content. This underscores the significance of large, mixed datasets in enhancing model generalization and effectiveness in NLP tasks.
Monolingual Advantage Over Multilingual Models: The models specific to Marathi and Hindi showed a clear advantage over multilingual models like MuRIL, indicating that focused language-specific pre-training can better capture the nuances required for tasks in those languages.

Implications and Future Directions

The paper's findings indicate a shift in how low-resource languages should approach model pre-training for hate speech detection and potentially other NLP tasks. By illustrating the reduced necessity of hateful data-centric pre-training, the paper provides a paradigm shift towards leveraging comprehensive mixed datasets.

Future research can explore the incorporation of more diverse data sources and analyze model robustness across various domains within hate speech. Additionally, extending such studies to other low-resource languages could further corroborate these findings and offer tailored strategies for enhancing linguistic model capabilities.

Overall, this research provides crucial insights into pre-training strategies that can shift focus towards more efficient and inclusive NLP model training methodologies, particularly in the context of hate speech detection in linguistically diverse settings.

PDF Markdown

Related Papers

GitHub

GitHub - l3cube-pune/MarathiNLP: Marathi NLP - is a repository dedicated to development of tools and resources for Marathi language. (105 stars)