Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aligning Language Models with Human Preferences via a Bayesian Approach (2310.05782v3)

Published 9 Oct 2023 in cs.CL

Abstract: In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
  2. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
  3. Valerio Basile et al. It’s the end of the gold standard as we know it. on the impact of pre-aggregation on the evaluation of highly subjective tasks. In CEUR WORKSHOP PROCEEDINGS, volume 2776, pages 31–40. CEUR-WS, 2020.
  4. Better rewards yield better summaries: Learning to summarise without references. In Proceedings of the 2019 Conference on Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China, 2019.
  5. Improving multi-turn emotional support dialogue generation with lookahead strategy planning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3014–3026. Association for Computational Linguistics, 2022.
  6. Reward modeling for mitigating toxicity in transformer-based language models. Applied Intelligence, 53(7):8421–8435, 2023.
  7. Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 653–670, 2020.
  8. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  9. Mixed policy gradient. arXiv preprint arXiv:2102.11513, 2021.
  10. Aligning ai with shared human values. In International Conference on Learning Representations, 2021.
  11. Human-centric dialog training via offline reinforcement learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3985–4003, 2020.
  12. A review of recent deep learning approaches in human-centered machine learning. Sensors, 21(7):2514, 2021.
  13. Harmonization sometimes harms. 2020.
  14. Human-centric research for nlp: Towards a definition and guiding questions. arXiv preprint arXiv:2207.04447, 2022.
  15. Can neural machine translation be improved with user feedback? In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 92–105, 2018.
  16. Reliability and learnability of human bandit feedback for sequence-to-sequence reinforcement learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1777–1788, 2018.
  17. Agreeing to disagree: Annotating offensive language datasets with annotators’ disagreement. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10528–10539, 2021.
  18. Trustworthy ai: From principles to practices. ACM Computing Surveys, 55(9):1–46, 2023.
  19. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  20. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  21. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, 2016.
  22. Towards emotional support dialog systems. In Proceedings of the 59th annual meeting of the Association for Computational Linguistics, 2021.
  23. Brio: Bringing order to abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2890–2903, 2022.
  24. Memory-assisted prompt editing to improve gpt-3 after deployment. arXiv preprint arXiv:2201.06009, 2022.
  25. Motivational interviewing: Preparing people for change. The Journal for Healthcare Quality (JHQ), 25(3):46, 2003.
  26. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  27. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  28. Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 33–44, 2020.
  29. Resolving label uncertainty with implicit posterior models. In Uncertainty in Artificial Intelligence, pages 1707–1717. PMLR, 2022.
  30. Ben Shneiderman. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction, 36(6):495–504, 2020.
  31. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  32. Alleviating exposure bias via contrastive learning for abstractive text summarization. arXiv preprint arXiv:2108.11846, 2021.
  33. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015.
  34. Diverse beam search: Decoding diverse solutions from neural sequence models. AAAI, 2018.
  35. Putting humans in the natural language processing loop: A survey. In Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, pages 47–52, 2021.
  36. Curating a large-scale motivational interviewing dataset using peer support forums. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3315–3330, 2022.
  37. Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. EMNLP, 2022.
  38. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2019.
  39. Momentum calibration for text generation. arXiv preprint arXiv:2212.04257, 2022.
  40. Calibrating sequence likelihood improves conditional language generation. arXiv preprint arXiv:2210.00045, 2022.
  41. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
  42. The moral integrity corpus: A benchmark for ethical dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3755–3773, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiashuo Wang (19 papers)
  2. Haozhao Wang (52 papers)
  3. Shichao Sun (15 papers)
  4. Wenjie Li (183 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.