ToXCL: A Unified Framework for Toxic Speech Detection and Explanation (2403.16685v2)

Published 25 Mar 2024 in cs.CL and cs.CY

Abstract: The proliferation of online toxic speech is a pertinent problem posing threats to demographic groups. While explicit toxic speech contains offensive lexical signals, implicit one consists of coded or indirect language. Therefore, it is crucial for models not only to detect implicit toxic speech but also to explain its toxicity. This draws a unique need for unified frameworks that can effectively detect and explain implicit toxic speech. Prior works mainly formulated the task of toxic speech detection and explanation as a text generation problem. Nonetheless, models trained using this strategy can be prone to suffer from the consequent error propagation problem. Moreover, our experiments reveal that the detection results of such models are much lower than those that focus only on the detection task. To bridge these gaps, we introduce ToXCL, a unified framework for the detection and explanation of implicit toxic speech. Our model consists of three modules: a (i) Target Group Generator to generate the targeted demographic group(s) of a given post; an (ii) Encoder-Decoder Model in which the encoder focuses on detecting implicit toxic speech and is boosted by a (iii) Teacher Classifier via knowledge distillation, and the decoder generates the necessary explanation. ToXCL achieves new state-of-the-art effectiveness, and outperforms baselines significantly.

PDF Abstract

ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Introduction

Toxic speech, both explicit and implicit, presents significant challenges for online communities and platforms. Detecting and explaining toxic speech, especially when it is implicit and lacks overtly offensive language, requires models not only to identify such content accurately but also to provide explanations for their decisions. This task is further complicated by the subtlety and context-dependency of implicit toxic speech. Existing research has primarily approached this problem by treating detection and explanation as a singular text generation task, leading to potential issues such as error propagation and suboptimal detection performance.

A Novel Approach

In response, the paper introduces ToXCL, a framework that incorporates a Target Group Generator, an Encoder-Decoder Model, and a Teacher Classifier. This architectural approach aims to efficiently and effectively detect implicit toxic speech while also generating relevant explanations. ToXCL stands apart by integrating the tasks of detection and explanation into an encoder-decoder model, mitigating the error propagation issue observed in previous methodologies.

Key Contributions

Unified Framework: ToXCL represents a comprehensive solution, embedding both detection and explanation functionalities within a single, unified model. This integration negates the need for separate models for each task, thereby optimizing both computational efficiency and model performance.
Target Group Generator: The inclusion of a Target Group Generator enhances the model's ability to discern and classify the nuanced facets of implicit toxic speech. By identifying target minority groups from input posts, ToXCL leverages contextual and demographic insights to improve detection and explanation accuracy.
Encoder-Decoder Model with Teacher Classifier: The Encoder-Decoder Model serves dual purposes: detection (via the encoder) and explanation generation (via the decoder). The novel addition of a Teacher Classifier, utilizing knowledge distillation, further refines the detection capabilities of the encoder, ensuring high precision in identifying implicit toxic speech.
State-of-the-Art Results: ToXCL has demonstrated superior performance on benchmark datasets (Implicit Hate Corpus and Social Bias Inference Corpus), setting new records in both implicit toxic speech detection and explanation tasks.

Implications and Future Directions

The development and validation of ToXCL open several avenues for future research and practical application. Theoretically, it underscores the potential of leveraging target groups to understand the context and target of toxic speech better. Practically, the model offers promising prospects for developing more nuanced and effective content moderation tools. Future work may explore enhancing ToXCL's capability to handle coded language and sarcasm, further increasing the model's robustness and applicability across diverse online platforms.

Conclusion

ToXCL represents a significant advancement in the field of natural language processing, particularly in the context of online safety and content moderation. By providing a robust framework for detecting and explaining implicit toxic speech, it addresses both theoretical challenges and practical needs in managing online discourse. Through continual refinement and adaptation, models like ToXCL can significantly contribute to fostering safer, more inclusive online communities.