Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mapping Social Choice Theory to RLHF (2404.13038v1)

Published 19 Apr 2024 in cs.AI and cs.CY

Abstract: Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Approximating optimal social choice under metric preferences. Artificial Intelligence, 264:27–51, 2018. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2018.07.006. URL https://www.sciencedirect.com/science/article/pii/S0004370218304569.
  2. Distortion in social choice problems: The first 15 years and beyond. arXiv preprint arXiv:2103.00911, 2021.
  3. Dices dataset: Diversity in conversational ai evaluation for safety, 2023.
  4. Low-distortion social welfare functions. In AAAI Conference on Artificial Intelligence, 2019. URL https://api.semanticscholar.org/CorpusID:53078427.
  5. Handbook of computational social choice. Cambridge University Press, 2016.
  6. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  7. Social choice for ai alignment: Dealing with diverse human feedback. arXiv preprint arXiv:2404.10271v1, 2024.
  8. Sample complexity for winner prediction in elections. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp.  1421–1430, 2015.
  9. Generative social choice. arXiv preprint arXiv:2309.01291, 2023.
  10. Smoothed analysis of social choice revisited. In International Conference on Web and Internet Economics, pp.  290–309. Springer, 2023.
  11. Representation with incomplete votes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  5657–5664, 2023.
  12. Algorithmic collective action in machine learning. arXiv preprint arXiv:2302.04262, 2023.
  13. Incorporating worker perspectives into mturk annotation practices for nlp. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  1010–1028, 2023.
  14. The history and risks of reinforcement learning and human feedback. arXiv e-prints, pp.  arXiv–2310, 2023.
  15. Efficiently learning mixtures of mallows models. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp.  627–638. IEEE, 2018.
  16. Statistical rejection sampling improves preference optimization. arXiv preprint arXiv:2309.06657, 2023.
  17. David Miller. Deliberative democracy and social choice. Political studies, 40(1_suppl):54–67, 1992.
  18. Abhilash Mishra. Ai alignment and social choice: Fundamental limitations and policy implications. arXiv preprint arXiv:2310.16048, 2023.
  19. Discovering language model behaviors with model-written evaluations. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  13387–13434, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.847. URL https://aclanthology.org/2023.findings-acl.847.
  20. The distortion of cardinal preferences in voting. In International Workshop on Cooperative Information Agents, pp.  317–331. Springer, 2006.
  21. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
  22. A long way to go: Investigating length correlations in rlhf. arXiv preprint arXiv:2310.03716, 2023.
  23. Distributional preference learning: Understanding and accounting for hidden context in rlhf. ArXiv, abs/2312.08358, 2023. URL https://api.semanticscholar.org/CorpusID:266191810.
  24. Social choice under metric preferences: Scoring rules and stv. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  25. Stretching the effectiveness of mle from accuracy to bias for pairwise comparisons. In International Conference on Artificial Intelligence and Statistics, pp.  66–76. PMLR, 2020.
  26. Learning reward functions from scale feedback. In Conference on Robot Learning, pp.  353–362. PMLR, 2022.
  27. Fine-grained human feedback gives better rewards for language model training. arXiv preprint arXiv:2306.01693, 2023.
  28. Learning mixtures of plackett-luce models. In International Conference on Machine Learning, pp.  2906–2914. PMLR, 2016.
  29. Learning mixtures of random utility models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  30. Principled reinforcement learning with human feedback from pairwise or k𝑘kitalic_k-wise comparisons, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.