$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization (2410.21662v2)

Published 29 Oct 2024 in cs.CL and cs.LG

Abstract: Preference optimization has made significant progress recently, with numerous methods developed to align LLMs with human preferences. This paper introduces $f$-divergence Preference Optimization ($f$-PO), a novel framework that generalizes and extends existing approaches. $f$-PO minimizes $f$-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of $f$-divergences. We provide theoretical analysis of $f$-PO's properties and conduct extensive experiments on state-of-the-art LLMs using benchmark datasets. Results demonstrate $f$-PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different $f$-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of LLM alignment. Code is available at https://github.com/MinkaiXu/fPO.

References (45)