Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models (2310.11079v1)

Published 17 Oct 2023 in cs.CL and cs.AI

Abstract: Recently, researchers have made considerable improvements in dialogue systems with the progress of LLMs such as ChatGPT and GPT-4. These LLM-based chatbots encode the potential biases while retaining disparities that can harm humans during interactions. The traditional biases investigation methods often rely on human-written test cases. However, these test cases are usually expensive and limited. In this work, we propose a first-of-its-kind method that automatically generates test cases to detect LLMs' potential gender bias. We apply our method to three well-known LLMs and find that the generated test cases effectively identify the presence of biases. To address the biases identified, we propose a mitigation strategy that uses the generated test cases as demonstrations for in-context learning to circumvent the need for parameter fine-tuning. The experimental results show that LLMs generate fairer responses with the proposed approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hsuan Su (11 papers)
  2. Cheng-Chu Cheng (1 paper)
  3. Hua Farn (3 papers)
  4. Shachi H Kumar (17 papers)
  5. Saurav Sahay (34 papers)
  6. Shang-Tse Chen (28 papers)
  7. Hung-yi Lee (325 papers)
Citations (3)