Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation (2405.00254v2)

Published 30 Apr 2024 in cs.AI and cs.LG

Abstract: Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-LLMs recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due to the use of fewer data for learning each model by personalization). We then establish sample complexity guarantees for both approaches. For the latter, we aim to adhere to the single-model framework, as already deployed in the current RLHF paradigm, by carefully aggregating diverse and truthful preferences from humans. We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions. Under the probabilistic-opinion-feedback model, we also develop an approach to handle strategic human labelers who may bias and manipulate the aggregated preferences with untruthful feedback. Based on the ideas in mechanism design, our approach ensures truthful preference reporting, with the induced aggregation rule maximizing social welfare functions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Improving multimodal interactive agents with reinforcement learning from human feedback. arXiv preprint arXiv:2211.11602.
  2. Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33 20095–20107.
  3. Hierarchical optimization: An introduction. Annals of Operations Research, 34 1–11.
  4. Arrow, K. J. (1951). Alternative approaches to the theory of choice in risk-taking situations. Econometrica: Journal of the Econometric Society 404–437.
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  6. Spectrally-normalized margin bounds for neural networks. NeurIPS.
  7. Baxter, J. (2000). A model of inductive bias learning. Journal of artificial intelligence research, 12 149–198.
  8. Börgers, T. (2015). An introduction to the theory of mechanism design. Oxford University Press, USA.
  9. Offline multi-task transfer rl with representational penalization. arXiv preprint arXiv:2402.12570.
  10. Understanding heterogeneous preferences in random utility models: a latent class approach. Environmental and resource economics, 23 421–446.
  11. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39 324–345.
  12. Maxmin-rlhf: Towards equitable alignment of large language models with diverse human preferences. arXiv preprint arXiv:2402.08925.
  13. Clarke, E. H. (1971). Multipart pricing of public goods. Public choice 17–33.
  14. Probabilistic opinion pooling.
  15. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning. PMLR.
  16. Few-shot learning via learning the representation, provably. arXiv preprint arXiv:2002.09434.
  17. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning. PMLR.
  18. Genest, C. (1984). A characterization theorem for externally bayesian groups. The Annals of Statistics 1100–1105.
  19. An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems, 33 19586–19597.
  20. Groves, T. (1973). Incentives in teams. Econometrica: Journal of the Econometric Society 617–631.
  21. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316 2402–2410.
  22. A tail inequality for quadratic forms of subgaussian random vectors.
  23. Promptable behaviors: Personalizing multi-objective rewards from human preferences. arXiv preprint arXiv:2312.09337.
  24. Offline multitask representation learning for reinforcement learning. arXiv preprint arXiv:2403.11574.
  25. Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564.
  26. Rational consensus in science and society: A philosophical and mathematical study, vol. 24. Springer Science & Business Media.
  27. Personalized language modeling from personalized human feedback. arXiv preprint arXiv:2402.05133.
  28. List, C. (2013). Social choice theory.
  29. When is partially observable reinforcement learning not scary? In Conference on Learning Theory. PMLR.
  30. Optimistic mle: A generic model-based algorithm for partially observable sequential decision making. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing.
  31. On the power of multitask representation learning in linear mdp. arXiv preprint arXiv:2106.08053.
  32. Luce, R. D. (2005). Individual choice behavior: A theoretical analysis. Courier Corporation.
  33. Madansky, A. (1964). Externally bayesian groups. Rand Corporation.
  34. Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619.
  35. McConway, K. J. (1978). The combination of experts’ opinions in probability assessment: some theoretical considerations. Ph.D. thesis, University College London (University of London).
  36. New analysis and algorithm for learning with drifting distributions. In Algorithmic Learning Theory: 23rd International Conference, ALT 2012, Lyon, France, October 29-31, 2012. Proceedings 23. Springer.
  37. Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal processing magazine, 13 47–60.
  38. Moulin, H. (2004). Fair division and collective welfare. MIT press.
  39. Nash, J. (1953). Two-person cooperative games. Econometrica: Journal of the Econometric Society 128–140.
  40. Algorithmic mechanism design. In Proceedings of the thirty-first annual ACM symposium on Theory of computing.
  41. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35 27730–27744.
  42. Revisiting the linear-programming framework for offline rl with general function approximation. In International Conference on Machine Learning. PMLR.
  43. Plackett, R. L. (1975). The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics, 24 193–202.
  44. Demand system specification and estimation. Oxford University Press, USA.
  45. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
  46. Bridging offline reinforcement learning and imitation learning: A tale of pessimism. Advances in Neural Information Processing Systems, 34 11702–11716.
  47. A free lunch from the noise: Provable and practical exploration for representation learning. In Uncertainty in Artificial Intelligence. PMLR.
  48. Roughgarden, T. (2010). Algorithmic game theory. Communications of the ACM, 53 78–86.
  49. Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE transactions on neural networks and learning systems, 32 3710–3722.
  50. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  51. Sen, A. (2018). Collective Choice and Social Welfare. Harvard University Press.
  52. Skiadas, C. (2009). Asset pricing theory. Princeton University Press.
  53. Skiadas, C. (2016). Scale or translation invariant additive preferences. Unpublished manuscript.
  54. Stone, M. (1961). The opinion pool. The Annals of Mathematical Statistics 1339–1342.
  55. Effective lstms for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100.
  56. Provable meta-learning of linear representations. In International Conference on Machine Learning. PMLR.
  57. On the theory of transfer learning: The importance of task diversity. Advances in neural information processing systems, 33 7852–7862.
  58. Representation learning for online and offline rl in low-rank mdps. arXiv preprint arXiv:2110.04652.
  59. Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16 8–37.
  60. Is rlhf more difficult than standard rl? a theoretical perspective. Advances in Neural Information Processing Systems, 36.
  61. Deep tamer: Interactive agent shaping in high-dimensional state spaces. In Proceedings of the AAAI conference on artificial intelligence, vol. 32.
  62. Fine-grained human feedback gives better rewards for language model training. Advances in Neural Information Processing Systems, 36.
  63. Gibbs sampling from human feedback: A provable kl-constrained framework for rlhf. arXiv preprint arXiv:2312.11456.
  64. A useful variant of the davis–kahan theorem for statisticians. Biometrika, 102 315–323.
  65. Provable benefits of actor-critic methods for offline reinforcement learning. Advances in neural information processing systems, 34 13626–13640.
  66. Provable offline preference-based reinforcement learning. In The Twelfth International Conference on Learning Representations.
  67. Pac reinforcement learning for predictive state representations. arXiv preprint arXiv:2207.05738.
  68. Provable multi-party reinforcement learning with diverse human feedback. arXiv preprint arXiv:2403.05006.
  69. Principled reinforcement learning with human feedback from pairwise or k-wise comparisons. In International Conference on Machine Learning. PMLR.
  70. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chanwoo Park (24 papers)
  2. Mingyang Liu (18 papers)
  3. Kaiqing Zhang (70 papers)
  4. Asuman Ozdaglar (102 papers)
  5. Dingwen Kong (4 papers)
Citations (14)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets