Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Curiosity-Driven Reinforcement Learning from Human Feedback (2501.11463v1)

Published 20 Jan 2025 in cs.CL

Abstract: Reinforcement learning from human feedback (RLHF) has proven effective in aligning LLMs with human preferences, but often at the cost of reduced output diversity. This trade-off between diversity and alignment quality remains a significant challenge. Drawing inspiration from curiosity-driven exploration in reinforcement learning, we introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards, to optimize both output diversity and alignment quality. We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following. Our approach achieves significant gains in diversity on multiple diversity-oriented metrics while maintaining alignment with human preferences comparable to standard RLHF. We make our code publicly available at https://github.com/ernie-research/CD-RLHF.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Haoran Sun (65 papers)
  2. Yekun Chai (18 papers)
  3. Shuohuan Wang (30 papers)
  4. Yu Sun (226 papers)
  5. Hua Wu (191 papers)
  6. Haifeng Wang (194 papers)