Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs (2506.10054v1)

Published 11 Jun 2025 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model's evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model's learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Shangpin Peng (1 paper)
  2. Weinong Wang (8 papers)
  3. Zhuotao Tian (38 papers)
  4. Senqiao Yang (19 papers)
  5. Xing Wu (69 papers)
  6. Haotian Xu (48 papers)
  7. Chengquan Zhang (29 papers)
  8. Takashi Isobe (12 papers)
  9. Baotian Hu (67 papers)
  10. Min Zhang (630 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets