Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs (2402.08189v2)

Published 13 Feb 2024 in cs.HC
Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs

Abstract: When creating policies, plans, or designs for people, it is challenging for designers to foresee all of the ways in which people may reason and behave. Recently, LLMs have been shown to be able to simulate human reasoning. We extend this work by measuring LLMs ability to simulate strategic reasoning in the ultimatum game, a classic economics bargaining experiment. Experimental evidence shows human strategic reasoning is complex; people will often choose to punish other players to enforce social norms even at personal expense. We test if LLMs can replicate this behavior in simulation, comparing two structures: single LLMs and multi-agent systems. We compare their abilities to (1) simulate human-like reasoning in the ultimatum game, (2) simulate two player personalities, greedy and fair, and (3) create robust strategies that are logically complete and consistent with personality. Our evaluation shows that multi-agent systems are more accurate than single LLMs (88 percent vs. 50 percent) in simulating human reasoning and actions for personality pairs. Thus, there is potential to use LLMs to simulate human strategic reasoning to help decision and policy-makers perform preliminary explorations of how people behave in systems.

Simulating Human Strategic Behavior: An Evaluation of Single and Multi-agent LLMs

This essay provides an analysis of the research presented in the paper titled "Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs" by Karthik Sreedhar and Lydia Chilton. The paper investigates the capability of LLMs to simulate human-like strategic behavior, particularly in the context of the ultimatum game. Two LLM architectures are compared: single-agent and multi-agent frameworks. The paper evaluates their performance in modeling human behavior, especially focusing on strategic and personality-consistent actions.

The ultimatum game serves as the experimental framework. This classic economics game offers valuable insights into human strategic interactions and deviation from purely profit-maximizing strategies. Human subjects typically engage in altruistic punishment—often declining small, but nonzero, amounts—in order to enforce fairness. This complex understanding and resultant behavior provide a challenging scenario for LLM simulations. The paper assesses LLM performance in simulating this game through three primary investigative lenses: the ability to simulate human-like actions, accurately model distinct player personalities (greedy vs. fair), and create robust, consistent strategic plans.

Key Findings and Methodology

Simulation Infrastructure and Evaluation:

  1. Single vs. Multi-agent Architectures:
    • Single-agent involves GPT-4 simulating the entire game by handling both players.
    • Multi-agent architecture represents each player as a distinct GPT-4 instance, allowing for interaction dynamics more akin to independent agents.
  2. Main Results:
    • The multi-agent architecture achieved high accuracy (88%) in emulating human strategies and behavioral adherence to distinct personalities, substantially outperforming the single LLM setup (50% accuracy).
    • The majority of errors in the single-agent simulations were attributed to incomplete strategic plans, reinforcing the superiority of the multi-agent approach in strategic comprehensiveness.
  3. Gameplay Accuracy and Personality Modeling:
    • Multi-agent LLMs demonstrated effective modeling for both personality archetypes across various pairings.
    • Errors primarily stemmed from strategy inconsistencies rather than gameplay deviations, suggesting gaps in pre-simulation strategic formulation rather than dynamic interactions.
  4. Methodological Approach and Parameters:
    • Simulations were conducted across 40 different sessions for each condition.
    • GPT-4 models were responsible for reasoned outputs by incorporating personality-driven strategies reflective of human behavior.
    • The ultimatum game was iterated over five rounds, emphasizing longitudinal interaction dynamics.

Implications and Potential for AI

The findings suggest a significant potential application of multi-agent LLMs in simulating strategic human behaviors. Such simulations could benefit fields like policy-making, economics, human-computer interaction design, and strategic planning initiatives. By modeling various personality-driven behavioral strategies realistically, LLM-based simulations can enhance the predictive accuracy of how individuals respond in strategically competitive environments.

The high performance of multi-agent frameworks in this context posits vast future potential for leveraging AI to replicate complex, multi-faceted human cognitive behaviors. However, it is critical to acknowledge the constraints of the current paper, including the confines of a controlled experimental game scenario and potential limitations in real-world application veracity. The ability of LLMs to scale this behavioral fidelity to more intricate, high-stakes strategic contexts remains an open avenue for exploration.

Conclusions and Future Directions

The paper significantly advances the understanding of LLM capabilities in simulating nuanced human-like behaviors. Multi-agent architectures have exhibited proficiency in internalizing strategic interactions which are consistent with human experimental baselines. This work paves the way for further investigation into advanced interaction dynamics, extending beyond simple economic games to more intricate, real-life scenarios. Key areas for future research include exploring adaptive strategy development, incorporating environmental context variability, and handling broader agent-based simulations. As such, this foundational work augurs well for the emerging landscape of AI-driven behavior simulation in varied socio-economic and policy-driven settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. arXiv:2208.10264 [cs.CL]
  2. Michael Alvard. 2004. The Ultimatum Game, Fairness, and Cooperation among Big Game Hunters. 413–435. https://doi.org/10.1093/0199262055.003.0014
  3. Dan Ariely. 2008. Predictably Irrational: The Hidden Forces That Shape Our Decisions. Harper, New York, NY.
  4. Upside Down Dialectics: Exploring design conversations with synthetic humans. In under review at DESRIST 2024.
  5. Sparks: Inspiration for Science Writing using Language Models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (¡conf-loc¿, ¡city¿Virtual Event¡/city¿, ¡country¿Australia¡/country¿, ¡/conf-loc¿) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1002–1019. https://doi.org/10.1145/3532106.3533533
  6. Fulin Guo. 2023. GPT in Game Theory Experiments. arXiv:2305.05516 [econ.GN]
  7. Sil Hamilton. 2023. Blind Judgement: Agent-Based Supreme Court Modelling With GPT. arXiv:2301.05327 [cs.CL]
  8. Joseph Henrich. 2000. Does Culture Matter in Economic Behavior? Ultimatum Game Bargaining among the Machiguenga of the Peruvian Amazon. American Economic Review 90, 4 (September 2000), 973–979. https://doi.org/10.1257/aer.90.4.973
  9. TrueSkill™: A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2006/file/f44ee263952e65b3610b8ba51229d1f9-Paper.pdf
  10. John J. Horton. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv:2301.07543 [econ.GN]
  11. Daniel Houser and Kevin McCabe. 2014. Chapter 2 - Experimental Economics and Experimental Game Theory. In Neuroeconomics (Second Edition) (second edition ed.), Paul W. Glimcher and Ernst Fehr (Eds.). Academic Press, San Diego, 19–34. https://doi.org/10.1016/B978-0-12-416008-8.00002-4
  12. Daniel Kahneman. 2012. Thinking, fast and slow. Penguin, London.
  13. Rational snacking: Young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition 126, 1 (2013), 109–114. https://doi.org/10.1016/j.cognition.2012.08.004
  14. Daniel C. Krawczyk. 2018. Chapter 12 - Social Cognition: Reasoning With Others. In Reasoning, Daniel C. Krawczyk (Ed.). Academic Press, 283–311. https://doi.org/10.1016/B978-0-12-809285-9.00012-0
  15. Manfred Königstein. 2001. Personality influences on Ultimatum Game bargaining decisions. European Journal of Personality 15 (10 2001), S53 – S70. https://doi.org/10.1002/per.424
  16. Robert R McCrae and Paul T Jr Costa. 2008. The five-factor theory of personality. In Handbook of personality: Theory and research (3 ed.), Oliver P John, Richard W Robins, and Lawrence A Pervin (Eds.). The Guilford Press, 159–181.
  17. Sendhil Mullainathan and Eldar Shafir. 2013. Scarcity: Why having too little means so much. Times Books/Henry Holt and Co.
  18. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  19. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]
  20. Expectations in the Ultimatum Game: Distinct Effects of Mean and Variance of Expected Offers. Frontiers in Psychology 9 (2018). https://doi.org/10.3389/fpsyg.2018.00992
  21. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022). arXiv:2201.11903 https://arxiv.org/abs/2201.11903
  22. Progressive-Hint Prompting Improves Reasoning in Large Language Models. arXiv:2304.09797 [cs.CL]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Karthik Sreedhar (5 papers)
  2. Lydia Chilton (12 papers)
Citations (18)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com