The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Sign up for a free account or log in to generate a summary of this paper:
We ran into a problem analyzing this paper.
Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research, 41:45 – 67, 2020. https://api.semanticscholar.org/CorpusID:220055497.
Scaling laws for reward model overoptimization. In International Conference on Machine Learning, 2022. https://api.semanticscholar.org/CorpusID:252992904.
Contrastive decoding: Open-ended text generation as optimization. In Annual Meeting of the Association for Computational Linguistics, 2022. https://api.semanticscholar.org/CorpusID:253157949.
OpenAI. Introducing chatgpt. 2022. https://openai.com/blog/chatgpt.
Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning. Int. J. Hum. Comput. Stud., 160:102772, 2020. https://api.semanticscholar.org/CorpusID:220380881.
Sentence-bert: Sentence embeddings using siamese bert-networks. In Conference on Empirical Methods in Natural Language Processing, 2019. https://api.semanticscholar.org/CorpusID:201646309.
Deterministic policy gradient algorithms. In International Conference on Machine Learning, 2014. https://api.semanticscholar.org/CorpusID:13928442.