Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Orchestrating LLMs with Different Personalizations (2407.04181v1)

Published 4 Jul 2024 in cs.AI and cs.CL

Abstract: This paper presents a novel approach to aligning LLMs with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show that our method matches or surpasses existing preference merging techniques, providing a scalable, efficient alternative to fine-tuning LLMs for individual personalization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jin Peng Zhou (28 papers)
  2. Katie Z Luo (11 papers)
  3. Jingwen Gu (2 papers)
  4. Jason Yuan (1 paper)
  5. Kilian Q. Weinberger (105 papers)
  6. Wen Sun (124 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets