Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models (2202.04173v3)

Published 8 Feb 2022 in cs.CL, cs.AI, cs.CY, and cs.LG

Abstract: Pre-trained LLMs (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of LLMs. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Boxin Wang (28 papers)
  2. Wei Ping (51 papers)
  3. Chaowei Xiao (110 papers)
  4. Peng Xu (357 papers)
  5. Mostofa Patwary (34 papers)
  6. Mohammad Shoeybi (60 papers)
  7. Bo Li (1107 papers)
  8. Anima Anandkumar (236 papers)
  9. Bryan Catanzaro (123 papers)
Citations (56)