Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mechanism Design for LLM Fine-tuning with Multiple Reward Models (2405.16276v3)

Published 25 May 2024 in cs.GT

Abstract: Fine-tuning LLMs to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, which could affect the fine-tuned outcomes. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on a representative class of training rules that maximize social welfare subject to certain regularizations, referred to as \tr\ rules. Firstly, we show that under most circumstances, truthful reporting is sub-optimal with simply a training rule, thereby highlighting the necessity of payments. Secondly, we design affine maximizer payment rules that implement \tr\ rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence properties. For a training rule that satisfies these conditions, we have found all the payment rules that implement it in DSIC, as they only differ by a constant term irrelevant to agents' reports from each other. Thirdly, we demonstrate that our mechanism is approximately DSIC even with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM setups further confirm the practical implications of our results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Playing repeated games with large language models. arXiv preprint arXiv:2305.16867, 2023.
  2. Monotonicity and implementability. Econometrica, 78(5):1749–1772, 2010.
  3. The dynamic pivot mechanism. Econometrica, 78(2):771–789, 2010.
  4. Weak monotonicity characterizes deterministic dominant-strategy implementation. Econometrica, 74(4):1109–1132, 2006.
  5. Convex optimization. Cambridge university press, 2004.
  6. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  7. Pricing randomized allocations. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 585–597. SIAM, 2010.
  8. The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120(51):e2316205120, 2023.
  9. Mechanism design with approximate valuations. In Proceedings of the 3rd Innovations in Theoretical Computer Science conference, pages 34–38, 2012.
  10. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  11. Edward H Clarke. Multipart pricing of public goods. Public choice, pages 17–33, 1971.
  12. Self-interested automated mechanism design and implications for optimal combinatorial auctions. In Proceedings of the 5th ACM Conference on Electronic Commerce, pages 132–141, 2004.
  13. Social choice for ai alignment: Dealing with diverse human feedback. arXiv preprint arXiv:2404.10271, 2024.
  14. Reward model ensembles help mitigate overoptimization. arXiv preprint arXiv:2310.02743, 2023.
  15. Differentiable economics for randomized affine maximizer auctions. arXiv preprint arXiv:2202.02872, 2022.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  17. A scalable neural network for dsic affine maximizer auction design. Advances in Neural Information Processing Systems, 36, 2024a.
  18. Scalable virtual valuations combinatorial auction design by combining zeroth-order and first-order optimization method. arXiv preprint arXiv:2402.11904, 2024b.
  19. Auctions with llm summaries. arXiv preprint arXiv:2404.08126, 2024.
  20. Mechanism design for large language models. arXiv preprint arXiv:2310.10826, 2023.
  21. Helping or herding? reward model ensembles mitigate but do not eliminate reward hacking. arXiv preprint arXiv:2312.09244, 2023.
  22. Can large language models serve as rational players in game theory? a systematic analysis. arXiv preprint arXiv:2312.05488, 2023.
  23. Online advertisements with llms: Opportunities and challenges. arXiv preprint arXiv:2311.07601, 2023.
  24. Strategic reasoning with language models. arXiv preprint arXiv:2305.19165, 2023.
  25. States as strings as strategies: Steering language models with game-theoretic solvers. arXiv preprint arXiv:2402.01704, 2024.
  26. Theodore Groves. Incentives in teams. Econometrica: Journal of the Econometric Society, pages 617–631, 1973.
  27. Large language models as rational players in competitive economics games. arXiv preprint arXiv:2308.10032, 2023.
  28. Economics arena for large language models. arXiv preprint arXiv:2401.01735, 2024.
  29. Characterization of revenue equivalence. Econometrica, 77(1):307–316, 2009.
  30. Revenue equivalence revisited. Games and Economic Behavior, 64(1):171–192, 2008.
  31. The consensus game: Language model generation via equilibrium search. arXiv preprint arXiv:2310.09139, 2023.
  32. Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564, 2023.
  33. Mixed bundling auctions. Journal of Economic Theory, 134(1):494–512, 2007.
  34. Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852, 2023.
  35. Openassistant conversations-democratizing large language model alignment. Advances in Neural Information Processing Systems, 36, 2024.
  36. Fine-tuning games: Bargaining and adaptation for general-purpose models. arXiv preprint arXiv:2308.04399, 2023.
  37. Methods for boosting revenue in combinatorial auctions. In AAAI, pages 232–237, 2004.
  38. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  39. Strategic behavior of large language models: Game structure vs. contextual framing. arXiv preprint arXiv:2309.05898, 2023.
  40. Linear and nonlinear programming, volume 2. Springer, 1984.
  41. Mitsunobu Miyake. On the incentive properties of multi-item auctions. International Journal of Game Theory, 27:1–19, 1998.
  42. Nash learning from human feedback. arXiv preprint arXiv:2312.00886, 2023.
  43. Roger B Myerson. Optimal auction design. Mathematics of operations research, 6(1):58–73, 1981.
  44. Noam Nisan et al. Introduction to mechanism design (for computer scientists). Algorithmic game theory, 9:209–242, 2007.
  45. Numerical optimization. Springer, 1999.
  46. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  47. Principled rlhf from heterogeneous feedback via personalization and preference aggregation. arXiv preprint arXiv:2405.00254, 2024.
  48. Dynamic mechanism design: A myersonian approach. Econometrica, 82(2):601–653, 2014.
  49. Improving language understanding by generative pre-training. 2018.
  50. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2023.
  51. Warm: On the benefits of weight averaged reward models. arXiv preprint arXiv:2401.12187, 2024.
  52. Kevin Roberts. The characterization of implementable choice rules. Aggregation and revelation of preferences, 12(2):321–348, 1979.
  53. Jean-Charles Rochet. A necessary and sufficient condition for rationalizability in a quasi-linear context. Journal of mathematical Economics, 16(2):191–200, 1987.
  54. Direct nash optimization: Teaching language models to self-improve with general preferences. arXiv preprint arXiv:2404.03715, 2024.
  55. Weak monotonicity suffices for truthfulness on convex domains. In Proceedings of the 6th ACM conference on Electronic commerce, pages 286–293, 2005.
  56. Automated design of revenue-maximizing combinatorial auctions. Operations Research, 63(5):1000–1025, 2015.
  57. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025, 2023.
  58. Truthful aggregation of llms with an application to online advertising. arXiv preprint arXiv:2405.05905, 2024.
  59. Mixed-bundling auctions with reserve prices. In AAMAS, pages 729–736, 2012.
  60. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  61. William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8–37, 1961.
  62. Secrets of rlhf in large language models part ii: Reward modeling. arXiv preprint arXiv:2401.06080, 2024.
  63. Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv preprint arXiv:2310.01320, 2023a.
  64. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023b.
  65. Fine-grained human feedback gives better rewards for language model training. Advances in Neural Information Processing Systems, 36, 2024.
  66. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023a.
  67. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023b.
  68. Improving reinforcement learning from human feedback with efficient reward model ensemble. arXiv preprint arXiv:2401.16635, 2024.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com