Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment (2505.15456v1)

Published 21 May 2025 in cs.CL

Abstract: Personalized alignment is essential for enabling LLMs to engage effectively in user-centric dialogue. While recent prompt-based and offline optimization methods offer preliminary solutions, they fall short in cold-start scenarios and long-term personalization due to their inherently static and shallow designs. In this work, we introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework, in which an LLM interacts with a simulated user model to iteratively infer and refine user profiles through dialogue. The training process is guided by a dual-level reward structure: the Profile Reward encourages accurate construction of user representations, while the Response Reward incentivizes generation of responses consistent with the inferred profile. We instantiate RLPA by fine-tuning Qwen-2.5-3B-Instruct, resulting in Qwen-RLPA, which achieves state-of-the-art performance in personalized dialogue. Empirical evaluations demonstrate that Qwen-RLPA consistently outperforms prompting and offline fine-tuning baselines, and even surpasses advanced commercial models such as Claude-3.5 and GPT-4o. Further analysis highlights Qwen-RLPA's robustness in reconciling conflicting user preferences, sustaining long-term personalization and delivering more efficient inference compared to recent reasoning-focused LLMs. These results emphasize the potential of dynamic profile inference as a more effective paradigm for building personalized dialogue systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Weixiang Zhao (21 papers)
  2. Xingyu Sui (9 papers)
  3. Yulin Hu (37 papers)
  4. Jiahe Guo (12 papers)
  5. Haixiao Liu (1 paper)
  6. Biye Li (6 papers)
  7. Yanyan Zhao (39 papers)
  8. Bing Qin (186 papers)
  9. Ting Liu (329 papers)

Summary

We haven't generated a summary for this paper yet.