Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values (2310.07629v1)

Published 11 Oct 2023 in cs.CL and cs.CY

Abstract: Human feedback is increasingly used to steer the behaviours of LLMs. However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and arXiv repositories.First, we summarise the past, pre-LLM trends for integrating human feedback into LLMs. Second, we give an overview of present techniques and practices, as well as the motivations for using feedback; conceptual frameworks for defining values and preferences; and how feedback is collected and from whom. Finally, we encourage a better future of feedback learning in LLMs by raising five unresolved conceptual and practical challenges.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hannah Rose Kirk (33 papers)
  2. Andrew M. Bean (7 papers)
  3. Bertie Vidgen (35 papers)
  4. Paul Röttger (37 papers)
  5. Scott A. Hale (48 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.