Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs (2403.05020v4)

Published 8 Mar 2024 in cs.CL and cs.AI

Abstract: Recent advances in LLMs (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

An Analysis of the Challenges in Simulating Social Interactions with LLMs

The paper "Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs" investigates the efficacy of using LLMs to simulate human social interactions. The authors identify a fundamental misalignment between how LLMs are used to simulate these interactions and the inherent non-omniscient, information asymmetric nature of human communications.

The authors develop a structured evaluation framework that distinguishes between two modes of simulation: Script mode, where a single LLM has omniscient access to all participants' information and goals, and Agents mode, where multiple LLMs independently simulate distinct agents without access to each other's internal states. Through experiments, the authors discern that the Script mode leads to an overestimation of social goal achievement and interaction naturalness when compared to the more realistic Agents mode.

Quantitative findings underscore a significant disparity in performance: agents in Script mode displayed enhanced success in achieving social objectives, with higher completion rates and fluid dialogue. On the other hand, Agents mode, which better emulates human-like information processing due to its information asymmetry features, resulted in less natural and poorer goal-oriented interactions. Interestingly, alternative approaches, such as allowing agents to have access to others' mental states (referred to as Mindreaders mode), also demonstrate superior performance over true human-like asymmetry scenarios, indicating the crucial role of information sharing in enhancing interaction outcomes.

The paper ventures further to explore whether training LLMs using data from Script simulations could yield improvements in real-world interaction simulations. Finetuning LLMs on Script data improved dialogue naturalness but did not enhance the accuracy of goal completion significantly in cooperative scenarios where precise understanding and inference of interlocutor's unknown states are vital. The authors attribute this limited improvement to the inherent biases found in Script simulations, where omniscient setups tend to produce overly agreeable or unnatural decision-making strategies due to their unrestricted access to internal states.

The authors recommend careful reporting and a delineated understanding of simulation modes in related research, advocating for a transparent approach while recognizing the limitations laid out in their findings. They propose "simulation cards" in analogy to model cards, to offer a detailed index of simulation procedures, facilitating better discourse on the application and evaluation of LLM-based agents in simulating social interactions.

In addressing future developments, the paper calls for more human-like modeling approaches, moving beyond simple omniscience and embracing techniques that simulate human strategic reasoning in the face of information asymmetry. Such modeling might involve more explicit scaffolding of LLM responses based on inferred beliefs and shared knowledge within dialogues.

This research presents a cautionary perspective on the oversimplification involved in LLM-based social simulations and urges the field to recognize its current limitations, aiming for better alignment with human cognitive and social processes. The paper ultimately highlights the enduring challenge of bridging machine-like perception and human-like interaction complexity, driving towards more nuanced and practical applications in social AI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Indirectness as a path to common ground management.
  2. J L Austin. 1975. How to do things with words: Second edition, 2 edition. The William James Lectures. Harvard University Press, London, England.
  3. Karen Bartsch and Henry M. Wellman. 1995. Children Talk About the Mind. Oxford University Press.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
  5. Loopholes: A window into value alignment and the communication of meaning.
  6. Fausto Carcassi and Michael Franke. 2023. How to handle the truth: A model of politeness as strategic truth-stretching. Proceedings of the Annual Meeting of the Cognitive Science Society, 45(45).
  7. Places: Prompting language models for social conversation synthesis. In Findings.
  8. PLACES: Prompting language models for social conversation synthesis. In Findings of EACL 2023.
  9. Herbert H Clark. 1996. Using Language. Cambridge University Press.
  10. Under the surface: Tracking the artifactuality of llm-generated data. arXiv preprint arXiv:2401.14698.
  11. Daniel C Dennett. 1978. Beliefs about beliefs. Behav. Brain Sci., 1(4):568–570.
  12. Anthropomorphization of ai: Opportunities and risks.
  13. M Franke. 2009. Signal to act: Game theory in pragmatics. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam.
  14. How Efficiency Shapes Human Language. Trends in cognitive sciences, 23(5):389–407.
  15. Nigel Gilbert. 2005. Simulation for the Social Scientist, 2 edition. Open University Press.
  16. Noah D Goodman and Michael C Frank. 2016. Pragmatic Language Interpretation as Probabilistic Inference. Trends in cognitive sciences, 20(11):818–829.
  17. The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective. Cognitive science, 45(3):e12926.
  18. Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1766–1776, Vancouver, Canada. Association for Computational Linguistics.
  19. Decoupling strategy and generation in negotiation dialogues. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2333–2343, Brussels, Belgium. Association for Computational Linguistics.
  20. An overview of catastrophic ai risks.
  21. Zero-shot goal-directed dialogue via rl on imagined conversations. ArXiv, abs/2311.05584.
  22. Resolving indirect referring expressions for entity selection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12313–12335, Stroudsburg, PA, USA. Association for Computational Linguistics.
  23. A fine-grained comparison of pragmatic language understanding in humans and language models. arXiv [cs.CL].
  24. A review of urban residential choice models using Agent-Based modeling. Environment and planning. B, Planning & design, 41(4):661–689.
  25. Mixtral of experts.
  26. Charles Kemp and Terry Regier. 2012. Kinship categories across languages reflect general communicative principles. Science (New York, N.Y.), 336(6084):1049–1054.
  27. SODA: Million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12930–12949, Singapore. Association for Computational Linguistics.
  28. FANToM: A benchmark for stress-testing machine theory of mind in interactions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14397–14413, Singapore. Association for Computational Linguistics.
  29. Stephen C. Levinson. 2016. Turn-taking in human communication – origins and implications for language processing. Trends in Cognitive Sciences, 20(1):6–14.
  30. Camel: Communicative agents for" mind" exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems.
  31. Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents.
  32. Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, abs/2305.19118.
  33. Inferring rewards from language in context.
  34. Evaluating statistical language models as pragmatic reasoners. arXiv [cs.CL].
  35. Reproducibility in nlp: What have we learned from the checklist? In Annual Meeting of the Association for Computational Linguistics.
  36. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 220–229, New York, NY, USA. Association for Computing Machinery.
  37. Designing and detecting lies by reasoning about other agents. Journal of experimental psychology. General, 152(2):346–362.
  38. Training language models to follow instructions with human feedback.
  39. Self-alignment of large language models via monopolylogue-based social scene simulation.
  40. Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
  41. Social simulacra: Creating populated prototypes for social computing systems. In In the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22), UIST ’22, New York, NY, USA. Association for Computing Machinery.
  42. NOPE: A corpus of naturally-occurring presuppositions in english. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 349–366, Stroudsburg, PA, USA. Association for Computational Linguistics.
  43. The logic of indirect speech. Proceedings of the National Academy of Sciences of the United States of America, 105(3):833–838.
  44. David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? The Behavioral and brain sciences, 1(4):515–526.
  45. Modeling punishment as a rational communicative social action. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).
  46. The goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by LLMs. In Thirty-seventh Conference on Neural Information Processing Systems.
  47. Verbosity bias in preference labeling by large language models. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
  48. R. Keith Sawyer. 2005. Social Emergence: Societies As Complex Systems. Cambridge University Press.
  49. Green ai.
  50. Murray Shanahan. 2023. Talking about large language models.
  51. Towards understanding sycophancy in language models.
  52. Am I me or you? state-of-the-art dialogue models cannot maintain an identity. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2367–2387, Seattle, United States. Association for Computational Linguistics.
  53. Can you put it all together: Evaluating conversational agents’ ability to blend skills. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2021–2030, Online. Association for Computational Linguistics.
  54. Robert Stalnaker. 2014. Context. Oxford University Press.
  55. Cognitive Architectures for Language Agents.
  56. Reconciling truthfulness and relevance as epistemic and decision-theoretic utility. Psychological review.
  57. Leigh Tesfatsion and Kenneth L Judd. 2006. Handbook of Computational Economics: Agent-Based Computational Economics. Elsevier.
  58. Do llms exhibit human-like response biases? a case study in survey design. arXiv preprint arXiv:2311.04076.
  59. Michael Tomasello. 1999. The Cultural Origins of Human Cognition. Harvard University Press.
  60. Michael Tomasello. 2021. Becoming Human: A Theory of Ontogeny. Belknap Press.
  61. Bootstrapping llm-based task-oriented dialogue agents via self-talk. ArXiv, abs/2401.05033.
  62. Humanoid agents: Platform for simulating human-like generative agents. In EMNLP System Demonstrations.
  63. Max Weber. 1978. The Nature of Social Action, page 7–32. Cambridge University Press.
  64. From word models to world models: Translating from natural language to the probabilistic language of thought.
  65. Inferring the goals of communicating agents from actions and instructions. arXiv [cs.AI].
  66. Polite Speech Emerges From Competing Social Goals. Open mind : discoveries in cognitive science, 4(4):71–87.
  67. Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences of the United States of America, 115(31):7937–7942.
  68. Challenges in automated debiasing for toxic language detection. In EACL.
  69. Sotopia: Interactive evaluation for social intelligence in language agents. In ICLR.
  70. Can Large Language Models Transform Computational Social Science? Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xuhui Zhou (33 papers)
  2. Zhe Su (33 papers)
  3. Tiwalayo Eisape (4 papers)
  4. Hyunwoo Kim (52 papers)
  5. Maarten Sap (86 papers)
Citations (21)