Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs (2404.10160v6)

Published 15 Apr 2024 in cs.AI

Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning. Our approach comprises two modes: (1) self-reflection, where the same LLM participates in multi-role debates, and (2) teacher-student, where a more advanced LLM like GPT-3.5-turbo guides the LLM to perform this task. Experimental results across different LLMs on BBQ and our datasets demonstrate the effectiveness of our approach in bias mitigation. Our source code and datasets are available at \texttt{https://anonymous.4open.science/r/RLDF-E344}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ruoxi Cheng (9 papers)
  2. Haoxuan Ma (10 papers)
  3. Shuirong Cao (6 papers)
  4. Jiaqi Li (142 papers)
  5. Aihua Pei (3 papers)
  6. Zhiqiang Wang (107 papers)
  7. Pengliang Ji (14 papers)
  8. Haoyu Wang (309 papers)
  9. Jiaqi Huo (1 paper)
Citations (2)