Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization (2505.18720v1)

Published 24 May 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Direct Preference Optimization (DPO) has emerged as a promising framework for aligning LLMs with human preferences by directly optimizing the log-likelihood difference between chosen and rejected responses. However, existing methods assign equal importance to all tokens in the response, while humans focus on more meaningful parts. This leads to suboptimal preference optimization, as irrelevant or noisy tokens disproportionately influence DPO loss. To address this limitation, we propose \textbf{O}ptimal \textbf{T}ransport-based token weighting scheme for enhancing direct \textbf{P}reference \textbf{O}ptimization (OTPO). By emphasizing semantically meaningful token pairs and de-emphasizing less relevant ones, our method introduces a context-aware token weighting scheme that yields a more contrastive reward difference estimate. This adaptive weighting enhances reward stability, improves interpretability, and ensures that preference optimization focuses on meaningful differences between responses. Extensive experiments have validated OTPO's effectiveness in improving instruction-following ability across various settings\footnote{Code is available at https://github.com/Mimasss2/OTPO.}.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

GitHub

GitHub - Mimasss2/OTPO: [ACL 2025] Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization (1 star)

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization (2505.18720v1)

Summary

Follow-up Questions

Related Papers

Authors (5)

GitHub