Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

ROPO: Robust Preference Optimization for Large Language Models (2404.04102v2)

Published 5 Apr 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Preference alignment is pivotal for empowering LLMs to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data. Recent efforts for this problem either marginally alleviate the impact of noise without the ability to actually reduce its presence, or rely on costly teacher LLMs prone to reward misgeneralization. To address these challenges, we propose the RObust Preference Optimization (ROPO) framework, an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Specifically, ROPO iteratively solves a constrained optimization problem, where we dynamically assign a quality-aware weight for each sample and constrain the sum of the weights to the number of samples we intend to retain. For noise-tolerant training and effective noise identification, we derive a robust loss by suppressing the gradients of samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is critical for distinguishing noisy samples from clean ones. Furthermore, inspired by our derived loss, we propose a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods, with its superiority growing as the noise rate increases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
  3. A general theoretical paradigm to understand learning from human preferences. arXiv preprint arXiv:2310.12036, 2023.
  4. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  6. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp.  2397–2430. PMLR, 2023.
  7. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  8. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  9. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  10. Self-play fine-tuning converts weak language models to strong language models. arXiv preprint arXiv:2401.01335, 2024.
  11. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017a.
  12. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017b.
  13. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arxiv:2301.07597, 2023.
  14. Learning to weight samples for dynamic early-exiting networks. In European Conference on Computer Vision, pp.  362–378. Springer, 2022.
  15. Learning preference model for llms via automatic preference data generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  9187–9199, 2023.
  16. Aligning large language models through synthetic feedback. arXiv preprint arXiv:2305.13735, 2023.
  17. Pretraining language models with human preferences. In International Conference on Machine Learning, pp.  17506–17533. PMLR, 2023.
  18. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  19. Rain: Your language models can align themselves without finetuning. arXiv preprint arXiv:2309.07124, 2023.
  20. The unlocking spell on base llms: Rethinking alignment via in-context learning. arXiv preprint arXiv:2312.01552, 2023.
  21. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 2023.
  22. Normalized loss functions for deep learning with noisy labels. In International conference on machine learning, pp.  6543–6553. PMLR, 2020.
  23. When less is more: Investigating data pruning for pretraining llms at scale. arXiv preprint arXiv:2309.04564, 2023.
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  25. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  26. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33:17044–17056, 2020.
  27. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
  28. Learning to reweight examples for robust deep learning. In International conference on machine learning, pp.  4334–4343. PMLR, 2018.
  29. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  30. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025, 2023.
  31. Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in neural information processing systems, 32, 2019.
  32. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
  33. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  34. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  35. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  36. Dataset cartography: Mapping and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  9275–9293, 2020.
  37. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  38. Learning with symmetric label noise: The importance of being unhinged. Advances in neural information processing systems, 28, 2015.
  39. Tl; dr: Mining reddit to learn automatic summarization. In Proceedings of the Workshop on New Frontiers in Summarization, pp.  59–63, 2017.
  40. Secrets of rlhf in large language models part ii: Reward modeling. arXiv preprint arXiv:2401.06080, 2024.
  41. Beyond reverse kl: Generalizing direct preference optimization with diverse divergence constraints. arXiv preprint arXiv:2309.16240, 2023a.
  42. Aligning language models with human preferences via a bayesian approach. arXiv preprint arXiv:2310.05782, 2023b.
  43. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  322–330, 2019.
  44. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023c.
  45. Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693, 2023.
  46. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023a.
  47. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023b.
  48. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018.
  49. Calibrating sequence likelihood improves conditional language generation. arXiv preprint arXiv:2210.00045, 2022.
  50. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
  51. Asymmetric loss functions for learning with noisy labels. In International conference on machine learning, pp.  12846–12856. PMLR, 2021.
  52. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.