Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 181 tok/s Pro
2000 character limit reached

Impact of Preference Noise on the Alignment Performance of Generative Language Models (2404.09824v1)

Published 15 Apr 2024 in cs.CL

Abstract: A key requirement in developing Generative LLMs (GLMs) is to have their values aligned with human values. Preference-based alignment is a widely used paradigm for this purpose, in which preferences over generation pairs are first elicited from human annotators or AI systems, and then fed into some alignment techniques, e.g., Direct Preference Optimization. However, a substantial percent (20 - 40%) of the preference pairs used in GLM alignment are noisy, and it remains unclear how the noise affects the alignment performance and how to mitigate its negative impact. In this paper, we propose a framework to inject desirable amounts and types of noise to the preferences, and systematically study the impact of preference noise on the alignment performance in two tasks (summarization and dialogue generation). We find that the alignment performance can be highly sensitive to the noise rates in the preference data: e.g., a 10 percentage points (pp) increase of the noise rate can lead to 30 pp drop in the alignment performance (in win rate). To mitigate the impact of noise, confidence-based data filtering shows significant benefit when certain types of noise are present. We hope our work can help the community better understand and mitigate the impact of preference noise in GLM alignment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. A general theoretical paradigm to understand learning from human preferences. arXiv preprint arXiv:2310.12036, 2023.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022a.
  4. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022b.
  5. Better rewards yield better summaries: Learning to summarise without references. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp.  3108–3118. Association for Computational Linguistics, 2019. doi: 10.18653/V1/D19-1307. URL https://doi.org/10.18653/v1/D19-1307.
  6. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  7. It’s an alignment, not a trade-off: Revisiting bias and variance in deep models. arXiv preprint arXiv:2310.09250, 2023.
  8. Self-play fine-tuning converts weak language models to strong language models. arXiv preprint arXiv:2401.01335, 2024.
  9. Cleaning uncertain data with quality guarantees. Proceedings of the VLDB Endowment, 1(1):722–735, 2008.
  10. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377, 2023.
  11. Training uncertainty-aware classifiers with conformalized deep learning. Advances in Neural Information Processing Systems, 35:22380–22395, 2022.
  12. Classification in the presence of label noise: A survey. IEEE Trans. Neural Networks Learn. Syst., 25(5):845–869, 2014. doi: 10.1109/TNNLS.2013.2292894. URL https://doi.org/10.1109/TNNLS.2013.2292894.
  13. Preference-based interactive multi-document summarisation. Inf. Retr. J., 23(6):555–585, 2020. doi: 10.1007/S10791-019-09367-8. URL https://doi.org/10.1007/s10791-019-09367-8.
  14. Gemma. Gemma: Open models based on gemini research and technology. Technical report, Google DeepMind, 2024.
  15. Gemini Team Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  16. A survey of label-noise representation learning: Past, present and future. CoRR, abs/2011.04406, 2020. URL https://arxiv.org/abs/2011.04406.
  17. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110:457–506, 2021.
  18. Reward-rational (implicit) choice: A unifying formalism for reward learning. Advances in Neural Information Processing Systems, 33:4415–4426, 2020.
  19. Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852, 2023.
  20. Maurice George Kendall. Rank correlation methods. 1948.
  21. Preference uncertainty, preference learning, and paired comparison experiments. Land Economics, 86(3):530–544, 2010.
  22. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. In International Conference on Learning Representations, 2024.
  23. B-pref: Benchmarking preference-based reinforcement learning. Neural Information Processing Systems, 2021a.
  24. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091, 2021b.
  25. Reward uncertainty for exploration in preference-based reinforcement learning. arXiv preprint arXiv:2205.12401, 2022.
  26. Humans are not boltzmann distributions: Challenges and opportunities for modelling human feedback and interaction in reinforcement learning. arXiv preprint arXiv:2206.13316, 2022.
  27. Statistical rejection sampling improves preference optimization. In International Conference on Learning Representations, 2024.
  28. Nash learning from human feedback. arXiv preprint arXiv:2312.00886, 2023.
  29. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  30. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Annual Conference on Neural Information Processing Systems, 2023.
  31. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  33. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  34. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning, pp. 4596–4604. PMLR, 2018.
  35. Classifier calibration: a survey on how to assess and improve predicted class probabilities. Machine Learning, 112(9):3211–3260, 2023.
  36. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  37. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  38. Louis L Thurstone. A law of comparative judgment. In Scaling, pp.  81–92. Routledge, 2017.
  39. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  40. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  41. Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
  42. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425, 2023.
  43. Judging llm-as-a-judge with mt-bench and chatbot arena. In Neural Information Processing Systems, Datasets and Benchmarks Track, 2023.
  44. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pp.  1433–1438. Chicago, IL, USA, 2008.
Citations (10)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube