COLD: A Benchmark for Chinese Offensive Language Detection (2201.06025v2)

Published 16 Jan 2022 in cs.CL and cs.AI

Abstract: Offensive language detection is increasingly crucial for maintaining a civilized social media platform and deploying pre-trained LLMs. However, this task in Chinese is still under exploration due to the scarcity of reliable datasets. To this end, we propose a benchmark --COLD for Chinese offensive language analysis, including a Chinese Offensive Language Dataset --COLDATASET and a baseline detector --COLDETECTOR which is trained on the dataset. We show that the COLD benchmark contributes to Chinese offensive language detection which is challenging for existing resources. We then deploy the COLDETECTOR and conduct detailed analyses on popular Chinese pre-trained LLMs. We first analyze the offensiveness of existing generative models and show that these models inevitably expose varying degrees of offensive issues. Furthermore, we investigate the factors that influence the offensive generations, and we find that anti-bias contents and keywords referring to certain groups or revealing negative attitudes trigger offensive outputs easier.

Authors (7)

Jiawen Deng (19 papers)
Jingyan Zhou (16 papers)
Hao Sun (383 papers)
Chujie Zheng (35 papers)
Fei Mi (56 papers)
Helen Meng (204 papers)
Minlie Huang (226 papers)

Citations (79)

View on Semantic Scholar

Summary

Overview of COLD: A Benchmark for Chinese Offensive Language Detection

The paper "COLD: A Benchmark for Chinese Offensive Language Detection" addresses the challenge of detecting offensive language within the Chinese digital landscape—an endeavor that has been constrained by the lack of adequate datasets. The authors introduce a comprehensive benchmark named COLD, which includes both a Chinese Offensive Language Dataset (COLDataset) and an associated baseline detector named COLDetector, which is trained on this dataset.

Contributions and Findings

COLDataset: This paper contributes a novel dataset comprising 37,480 Chinese comments, labeled as offensive or non-offensive. The dataset addresses complexities surrounding topics such as race, gender, and region, enhancing its applicability across diverse content categories. Notably, the test set is annotated at a granular level with subcategories like attacking individuals, attacking groups, anti-bias speech, and other non-offensive content.
COLDetector: Utilizing the COLDataset, the authors have developed COLDetector, which is based on a fine-tuned BERT model for the Chinese language. The experimental results underscore its efficacy over other existing screening methods. The detector achieved an accuracy of 81% on the COLDataset test set, significantly outperforming alternative techniques such as keyword matching and translated datasets from English sources.
Analysis of Chinese Generative Models: The paper evaluates multiple Chinese generative LLMs, including CDialGPT and EVA, to understand their susceptibility to generating offensive content. It was observed that not only offensive but also non-offensive and anti-bias prompts could result in offensive generations. Variations in performance were attributed to the models' structure, training datasets, and sentence generation lengths, accentuating potential biases present in the pre-training data.

Implications and Speculation on Future Developments

The paper's findings hold multiple implications for both practical applications and future research in the field of AI and natural language processing:

Practical Deployment: The development of COLDataset and COLDetector provides a critical foundation for enhancing content moderation systems across Chinese digital platforms. This is particularly crucial for maintaining civil discourse and ensuring the ethical deployment of AI systems in sensitive social contexts.
Cross-Linguistic Transferability: The paper highlights the limitations of relying solely on cross-linguistic translated datasets for training LLMs in different cultural contexts. It advocates for autonomous datasets that encapsulate the linguistic and cultural specifics inherent to the language being moderated, in this case, Chinese.
Advancement in AI Ethics: The interaction between input prompts and the propensity of models to generate offensive content underscores the necessity of further research into the biases in pre-training datasets. Addressing these biases remains a pivotal challenge for future work, as it directly influences the ethical stance and social justice fulfilled by machine learning applications.
Further Research Directions: Future research may explore defensive strategies for generative models to mitigate offensive content output. This research could also extend to analyzing context-specific interactions, as most current models, including COLDetector, are limited to sentence-level detections.

In conclusion, the paper adds significant value to the domain of Chinese offensive language detection by proposing COLD as a robust benchmark that facilitates a more nuanced analysis of LLMs' generation patterns within Chinese contexts. As AI systems continue to evolve, the insights garnered from this research will likely play a foundational role in forming a safer and more controlled deployment of language technologies.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - thu-coai/COLDataset: The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection (274 stars)