Countering hate on social media: Large scale classification of hate and counter speech (2006.01974v3)

Published 2 Jun 2020 in cs.CY, cs.LG, and cs.SI

Abstract: Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

Authors (5)

Joshua Garland (35 papers)
Keyan Ghazi-Zahedi (11 papers)
Jean-Gabriel Young (44 papers)
Laurent Hébert-Dufresne (79 papers)
Mirta Galesic (10 papers)

Citations (62)

View on Semantic Scholar

Summary

Countering Hate on Social Media: An Examination of Automated Classification Techniques

The paper "Countering Hate on Social Media: Large Scale Classification of Hate and Counter Speech" explores the pressing issue of hate speech proliferating on social media platforms and introduces a novel approach to understanding and classifying hate and counter speech via automated methods. This paper exploits a unique dataset from Germany and innovatively applies ensemble learning techniques to achieve significant advancements in automatically detecting counter speech, paving the way for more effective moderation strategies.

Summary and Methodology

Data Collection and Pre-Processing

The paper capitalizes on a unique dataset from self-labeling groups on German Twitter, where hate groups such as "Reconquista Germanica" and counter speech groups like "Reconquista Internet" actively engaged in discourse. This self-identification was essential in building a large corpus of labeled tweets. More than 9 million tweets were collected, with over 4 million tagged as originating from hate accounts and 4.3 million from counter accounts. This dataset was unprecedented in size for this domain, addressing a significant gap in training data availability for machine learning classifiers.

Feature Extraction and Model Training

The paper employs paragraph embeddings (doc2vec models) for feature extraction, which effectively captures semantic information from the text data. Several parameterizations were evaluated, focusing on balancing between document granularity and semantic retention. Classification was performed using regularized logistic regression, chosen for its simplicity and robustness. The decision boundary was iteratively optimized across training sets to enhance classification accuracy.

Ensemble Learning Approach

A novel ensemble learning approach was employed, where multiple classifiers, deemed "experts," voted on the classification of each tweet. This format allowed for improved accuracy and generalizability by leveraging the diversity in training data partitions and model parameterizations. The ensemble achieved macro F1 scores ranging from 0.76 to 0.97 on balanced test sets, showcasing competitive performance compared to existing state-of-the-art methods.

Empirical Validation and Human Alignment

The paper validates its classification framework through a crowdsourced alignment paper, demonstrating a strong correlation (r = 0.94) between automated classifications and human judgments. Such validation is crucial as it ascertains the practical applicability of the model in real-world scenarios.

Implications and Future Directions

The research holds both theoretical and practical implications:

Automated Moderation Systems: By improving the detection accuracy of hate and counter speech, social media platforms can automate the moderation process more effectively, potentially decreasing the prevalence of hateful content online.
Dynamics of Online Discourse: The paper’s findings on the interaction dynamics between hate and counter speech will inform strategies to promote constructive conversation and quell hateful rhetoric across platforms.
Future Research: Given the success of ensemble methods in this context, future research might explore further diversification of model architectures and combinations, including neural network approaches and other advanced NLP techniques.
Policy Implications: The results may influence policy by providing evidence for the efficacy of counter speech, challenging the need for heavy-handed censorship which can infringe on civil liberties.

The methodological framework established in this investigation represents a significant stride towards scalable solutions for countering hate speech in digital environments. It establishes a foundation for ongoing research into adaptive, automated systems to foster healthier online communities. As social media continues to evolve, such research will be critical in informing both platform design and public policy.

PDF Markdown

Related Papers

YouTube

Show All Videos