Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Personalized Adaptation via In-Context Preference Learning (2410.14001v1)

Published 17 Oct 2024 in cs.LG and cs.CL

Abstract: Reinforcement Learning from Human Feedback (RLHF) is widely used to align LLMs (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We present the Preference Pretrained Transformer (PPT), a novel approach for adaptive personalization using online user feedback. PPT leverages the in-context learning capabilities of transformers to dynamically adapt to individual preferences. Our approach consists of two phases: (1) an offline phase where we train a single policy model using a history-dependent loss function, and (2) an online phase where the model adapts to user preferences through in-context learning. We demonstrate PPT's effectiveness in a contextual bandit setting, showing that it achieves personalized adaptation superior to existing methods while significantly reducing the computational costs. Our results suggest the potential of in-context learning for scalable and efficient personalization in LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Allison Lau (2 papers)
  2. Younwoo Choi (4 papers)
  3. Vahid Balazadeh (8 papers)
  4. Keertana Chidambaram (3 papers)
  5. Vasilis Syrgkanis (106 papers)
  6. Rahul G. Krishnan (45 papers)