A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media (1910.12574v1)

Published 28 Oct 2019 in cs.SI, cs.CL, cs.IR, and cs.LG

Abstract: Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained LLM called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.

PDF Abstract

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

The paper "A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media" presents an innovative methodological framework aimed at advancing the automatic detection of hate speech within social media platforms. Using BERT (Bidirectional Encoder Representations from Transformers), the authors tackle the persistent challenge of bias and inadequate annotated datasets in hate speech detection tasks.

Methodological Approach

The core of this research is a transfer learning approach utilizing the BERT pre-trained LLM. Noteworthy is the authors' effort to overcome the limitations of previous models by fine-tuning BERT with multiple strategies. The paper identifies four distinct fine-tuning strategies: using BERT's [CLS] output for classification, incorporating nonlinear layers, inserting Bi-LSTM layers, and integrating CNN layers. Each strategy builds on BERT's capability to understand syntactical and contextual information at various layers, emphasizing the transfer learning aspect by adapting it to the specific needs of hate speech detection.

Data and Experimentation

The approach was evaluated on two publicly available datasets—Waseem and Hovy's racist and sexist tweets and Davidson's hate speech and offensive language tweets. The datasets present the real-world issue of imbalanced classes, comprising numerous examples of neutral versus hate or offensive content. This imbalance poses a significant challenge in model training and evaluation.

The preprocess steps detailed in the paper, such as replacing elongated words, handling hashtags, and managing user mentions, demonstrate a comprehensive strategy for data refinement, crucial for effective model performance.

Results and Performance

Experimental results display significant improvements, with diverse fine-tuning strategies surpassing existing benchmarks. Notably, the CNN layer integration with BERT's pre-trained layers resulted in the highest F1 scores of 88% and 92% for the two datasets respectively. This reinforces the hypothesis that leveraging all the information encoded in BERT’s transformers can lead to a more flexible and context-aware classification model.

Implications and Future Directions

The research outlines a critical insight into the biases inherent in training datasets and indicates that the BERT-based method might also serve a debiasing function, enhancing the dataset's integrity for future hate speech detection tasks.

Practically, this approach holds potential for implementation in monitoring social media platforms, enhancing both the accuracy and reliability of hate speech detection systems. The model's application extends beyond immediate detection capabilities as it offers a pertinent method of addressing data biases, a significant step towards more equitable AI systems.

Future advancements could expand on the exploration of BERT's embeddings for diverse languages and cultural contexts, addressing even more complex challenges inherent in multilingual social media environments. Moreover, the adaptability of this methodology to other forms of toxic digital content paves the way for broader applications in the domain of AI-driven content moderation.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Marzieh Mozafari (2 papers)
Reza Farahbakhsh (28 papers)
Noel Crespi (45 papers)

Citations (324)

View on Semantic Scholar

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media (1910.12574v1)