Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks (2305.04446v1)

Published 8 May 2023 in cs.CL and cs.AI

Abstract: The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we built Monitor Toxic Frame, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset ToxiCN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Junyu Lu (31 papers)
  2. Bo Xu (212 papers)
  3. Xiaokun Zhang (29 papers)
  4. Changrong Min (2 papers)
  5. Liang Yang (102 papers)
  6. Hongfei Lin (34 papers)
Citations (19)