Aligning LLMs with Individual Preferences via Interaction (2410.03642v2)

Published 4 Oct 2024 in cs.CL, cs.AI, and cs.HC

Abstract: As LLMs demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to principles such as helpfulness, harmlessness, and honesty, the need to account for individual and diverse preferences has been largely overlooked, potentially undermining customized human experiences. To address this gap, we train LLMs that can ''interact to align'', essentially cultivating the meta-skill of LLMs to implicitly infer the unspoken personalized preferences of the current user through multi-turn conversations, and then dynamically align their following behaviors and responses to these inferred preferences. Our approach involves establishing a diverse pool of 3,310 distinct user personas by initially creating seed examples, which are then expanded through iterative self-generation and filtering. Guided by distinct user personas, we leverage multi-LLM collaboration to develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. Finally, we apply supervised fine-tuning and reinforcement learning to enhance LLMs using this dataset. For evaluation, we establish the ALOE (ALign With CustOmized PrEferences) benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations. Experimental results demonstrate the effectiveness of our method in enabling dynamic, personalized alignment via interaction.

PDF HTML Abstract

Aligning LLMs with Individual Preferences via Interaction

The research paper introduces a methodology for aligning LLMs to individual user preferences through dynamic interaction. The objective is to enhance the alignment of LLMs with human values by moving beyond generalized alignment strategies centered on broad principles like helpfulness, harmlessness, and honesty. Instead, this paper focuses on cultivating the ability of LLMs to infer unspoken personalized preferences and adapt their responses accordingly during multi-turn conversations.

Technical Approach:

Persona Construction: The authors construct diverse user personas to guide LLM interactions. Beginning with 20 seed profiles, they employ iterative self-generation and semantic filtering driven by GPT-4o to expand the dataset to 3,310 distinct personas. These personas include detailed profiles, governing conversation topics, and complex personality traits, influencing communication style.
Data Formation: With these personas, a multi-turn preference dataset is developed using a multi-LLM collaboration framework. This involves role-playing to simulate user interactions, extracting persona-revealed information for personalized response creation, and generating responses using four LLMs that toggle between 'preferred' and 'rejected' outputs. The result is a tree-structured dataset conducive to training.
Training and Fine-Tuning Techniques: The preference dataset supports a two-phase training methodology:
- Supervised Fine-Tuning (SFT): Fine-tuning is initially applied using only the 'preferred' responses, establishing a foundational model capable of dynamic alignment interactions.
- Reinforcement Learning (RL): Subsequently, the model undergoes optimization via Direct Preference Optimization (DPO). This stage refines response calibration and alignment through pairwise preference data.

Evaluation and Benchmarks:

The authors present a benchmark called ALOE (ALign with custOmized prEferences), consisting of 100 curated cases with well-defined metrics to evaluate LLMs' capacity for preference alignment. The evaluation leverages an LLM-as-a-Judge model to assess alignment levels. Results indicate a significant performance enhancement in personalized alignment capabilities among mainstream LLMs using the proposed methodology, with average relative improvements in alignment levels of up to 32%.

Implications and Future Directions:

This research highlights the potential of LLMs in providing personalized user experiences, marking an important consideration in AI alignment research. The ability to understand and dynamically adapt to individual user preferences can significantly impact application interfaces, enhancing interaction quality and inclusivity across demographics. The framework offers scalability for broader applications, suggesting a promising pathway for further expanding LLM training to accommodate more nuanced human-LLM interactions. Future work may explore extending turn contexts for deeper conversational nuances and expanding the benchmark dataset for broader evaluation insights.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Shujin Wu (6 papers)
May Fung (8 papers)
Cheng Qian (81 papers)
Jeonghwan Kim (20 papers)
Heng Ji (266 papers)
Dilek Hakkani-Tur (94 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/shujin_wu/status/1843443473015509247