Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models (2302.07388v1)

Published 14 Feb 2023 in cs.CL and cs.AI

Abstract: Pretrained LLMs have become indispensable for solving various NLP tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shrimai Prabhumoye (40 papers)
  2. Mostofa Patwary (34 papers)
  3. Mohammad Shoeybi (60 papers)
  4. Bryan Catanzaro (123 papers)
Citations (16)