Detecting Hate Speech in Social Media

Published 18 Dec 2017 in cs.CL | (1712.06427v2)

Abstract: In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.

Abstract PDF Upgrade to Chat

Citations (314)

View on Semantic Scholar

Summary

The paper establishes a lexical baseline for three-class classification, distinguishing hate speech, offensive language, and non-offensive content on social media.
It applies a linear SVM with diverse text features, achieving 78% accuracy using character 4-grams on a dataset of 14,509 tweets.
The study highlights the nuanced challenge of separating hate speech from profanity and suggests future exploration with ensemble classifiers and meta-learning techniques.

The research paper presents a focused investigation into detecting hate speech on social media, specifically targeting the nuanced task of distinguishing hate speech from general profanity. This is achieved through implementing a supervised classification approach using a newly annotated dataset of English tweets labeled across three categories: hate speech, offensive language without hate speech, and non-offensive content. The primary motivation of the study is to establish a lexical baseline for this three-class classification challenge, utilizing various text features such as character $n$ -grams, word $n$ -grams, and word skip-grams.

Methodological Approach

Employing a linear Support Vector Machine (SVM) as the classification model, the study undertook a detailed feature analysis to identify the most effective predictors for the task. The dataset, composed of $14,509$ English tweets, was annotated to capture the nuanced distinction between hate speech and profanity—a step forward from the traditionally binary classifications of previous studies. The paper's approach provides a foundational comparison against a majority class baseline and incorporates an oracle to estimate an ideal upper boundary of classifier performance.

Numerical Insights and Performance

The results indicate that the best single feature performance was achieved with character $4$-grams, yielding an accuracy of $78\%$ . Other character $n$ -grams also demonstrated competitiveness, while the effectiveness of word-based features was inferior by comparison. The learning curve analysis suggested that increasing the dataset size could enhance accuracy further, although the rate of improvement diminishes as data volume grows. Notably, the oracle achieved accuracy of $91.6\%$ , highlighting the complexity of using real-world data for predicting hate speech without resorting to oversimplified models.

Implications and Future Directions

The research delineates the critical issue of differentiating profanity from hate speech, revealing the limitations of relying on offensive words as the primary discriminators. The confusion matrix illustrated significant overlap in misclassification between hate and offensive categories, underlining the task's intrinsic difficulty.

Future explorations could enhance this baseline study through several avenues. Developing ensemble classifiers and incorporating meta-learning techniques could afford better aggregate performance than individual models. Moreover, performing a linguistic analysis of the topmost informative features and conducting a comprehensive error analysis would yield deeper insights into the system's weaknesses and failure modes. Such analyses could drive more sophisticated feature engineering strategies, tailored to account for the dataset's idiosyncrasies and variability.

Conclusion

In conclusion, this work underscores the challenges of discerning hate speech from non-hate but offensive language on social media platforms. By establishing a measurable baseline and offering a robust evaluation framework, it sets the stage for future innovation in speech detection methodologies. This study's methods and findings play a significant role in advancing safer online communications by providing a basis for more accurate and context-sensitive identification of harmful content. Future research might incorporate cross-linguistic data to test the generalizability of these classifiers, potentially leading to broader applicability across different languages and cultural contexts.

Markdown