Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leashing the Inner Demons: Self-Detoxification for Language Models (2203.03072v1)

Published 6 Mar 2022 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs (LMs) can reproduce (or amplify) toxic language seen during training, which poses a risk to their practical application. In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity. Based on our findings, we propose a simple yet effective method for LLMs to "detoxify" themselves without an additional large corpus or external discriminator. Compared to a supervised baseline, our proposed method shows better toxicity reduction with good generation quality in the generated content under multiple settings. Warning: some examples shown in the paper may contain uncensored offensive content.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Canwen Xu (32 papers)
  2. Zexue He (23 papers)
  3. Zhankui He (27 papers)
  4. Julian McAuley (238 papers)
Citations (21)