Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Everyone Deserves A Reward: Learning Customized Human Preferences (2309.03126v2)

Published 6 Sep 2023 in cs.CL

Abstract: Reward models (RMs) are essential for aligning LLMs with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferences, current human feedback aligning methods only consider a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which includes preferred responses for each given query from four practical domains. Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages. We find several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment, and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pengyu Cheng (23 papers)
  2. Jiawen Xie (4 papers)
  3. Ke Bai (10 papers)
  4. Yong Dai (33 papers)
  5. Nan Du (66 papers)
Citations (24)
Github Logo Streamline Icon: https://streamlinehq.com