Mapping Social Choice Theory to RLHF (2404.13038v1)

Published 19 Apr 2024 in cs.AI and cs.CY

Abstract: Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

References (30)

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/enfleisig/status/1787558244736462864

https://twitter.com/fly51fly/status/1782523667152502930

Mapping Social Choice Theory to RLHF (2404.13038v1)

Summary

Related Papers

Tweets