Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

Published 3 Oct 2023 in cs.CL, cs.AI, cs.CY, cs.LG, and cs.MA | (2310.02124v3)

Abstract: As NLP systems are increasingly employed in intricate social environments, a pressing query emerges: Can these NLP systems mirror human-esque collaborative intelligence, in a multi-agent society consisting of multiple LLMs? This paper probes the collaboration mechanisms among contemporary NLP systems by melding practical experiments with theoretical insights. We fabricate four unique societies' comprised of LLM agents, where each agent is characterized by a specifictrait' (easy-going or overconfident) and engages in collaboration with a distinct `thinking pattern' (debate or reflection). Through evaluating these multi-agent societies on three benchmark datasets, we discern that certain collaborative strategies not only outshine previous top-tier approaches, but also optimize efficiency (using fewer API tokens). Moreover, our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring foundational social psychology theories. In conclusion, we integrate insights from social psychology to contextualize the collaboration of LLM agents, inspiring further investigations into the collaboration mechanism for LLMs. We commit to sharing our code and datasets\footnote{\url{https://github.com/zjunlp/MachineSoM}.}, hoping to catalyze further research in this promising avenue.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Playing repeated games with large language models. CoRR, abs/2305.16867, 2023. doi: 10.48550/arXiv.2305.16867. URL https://doi.org/10.48550/arXiv.2305.16867.
  2. Using arguments for making and explaining decisions. Artif. Intell., 173(3-4):413–436, 2009. doi: 10.1016/j.artint.2008.11.006. URL https://doi.org/10.1016/j.artint.2008.11.006.
  3. Gillie Bolton. Reflective practice: Writing and professional development. Sage publications, 2010. URL https://uk.sagepub.com/en-gb/eur/reflective-practice/book252252.
  4. Reconcile: Round-table conference improves reasoning via consensus among diverse llms. CoRR, abs/2309.13007, 2023a. doi: 10.48550/arXiv.2309.13007. URL https://doi.org/10.48550/arXiv.2309.13007.
  5. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. CoRR, abs/2308.10848, 2023b. doi: 10.48550/arXiv.2308.10848. URL https://doi.org/10.48550/arXiv.2308.10848.
  6. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel Médini (eds.), WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pp.  2778–2788. ACM, 2022. doi: 10.1145/3485447.3511998. URL https://doi.org/10.1145/3485447.3511998.
  7. Training verifiers to solve math word problems. arXiv prepring, abs/2110.14168, 2021. URL https://arxiv.org/abs/2110.14168.
  8. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint, abs/2305.14325, 2023. doi: 10.48550/arXiv.2305.14325. URL https://doi.org/10.48550/arXiv.2305.14325.
  9. Getting to yes: Negotiating agreement without giving in. Penguin, 2011. URL https://www.pon.harvard.edu/shop/getting-to-yes-negotiating-agreement-without-giving-in/.
  10. Donelson R Forsyth. Group dynamics. Cengage Learning, 2018. URL https://books.google.com/books?hl=zh-CN&lr=&id=vg9EDwAAQBAJ&oi=fnd&pg=PP1&dq=Group+dynamics&ots=t8uqfRGr5Y&sig=2AR5AoHxfWKNK04Nj7A-eRylqks#v=onepage&q=Group%20dynamics&f=false.
  11. Personality: Classic theories and modern research. Allyn and Bacon Boston, MA, 1999. URL https://books.google.com/books/about/Personality.html?id=ziTvDAAAQBAJ.
  12. Chatllm network: More brains, more intelligence. CoRR, abs/2304.12998, 2023. doi: 10.48550/arXiv.2304.12998. URL https://doi.org/10.48550/arXiv.2304.12998.
  13. David Held. Models of democracy. Polity, 2006. URL https://www.sup.org/books/title/?id=10597.
  14. Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021a. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
  15. Measuring mathematical problem solving with the MATH dataset. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021b. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/be83ab3ecd0db773eb2dc1b0a17836a1-Abstract-round2.html.
  16. Metagpt: Meta programming for multi-agent collaborative framework. CoRR, abs/2308.00352, 2023. doi: 10.48550/arXiv.2308.00352. URL https://doi.org/10.48550/arXiv.2308.00352.
  17. Fear and loathing across party lines: New evidence on group polarization. American journal of political science, 59(3):690–707, 2015. URL https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f8248c39c3daff874fb0f6f5abc667ebcdfee024.
  18. Irving L Janis. Victims of Groupthink: A psychological study of foreign-policy decisions and fiascoes. Houghton Mifflin, 1972. URL https://psycnet.apa.org/record/1975-29417-000.
  19. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12):248:1–248:38, 2023. doi: 10.1145/3571730. URL https://doi.org/10.1145/3571730.
  20. An educational psychology success story: Social interdependence theory and cooperative learning. Educational researcher, 38(5):365–379, 2009. URL https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=72585feb1200d53a81d4fb3e64862d69317b72c3.
  21. Ernst H. Kantorowicz. The King’s Two Bodies: A Study in Medieval Political Theology. Princeton University Press, 1985. ISBN 9780691169231. URL http://www.jstor.org/stable/j.ctvcszz1c.
  22. CAMEL: communicative agents for ”mind” exploration of large scale language model society. CoRR, abs/2303.17760, 2023. doi: 10.48550/arXiv.2303.17760. URL https://doi.org/10.48550/arXiv.2303.17760.
  23. Competition-level code generation with alphacode. Science, pp.  1092–1097, Dec 2022. doi: 10.1126/science.abq1158. URL http://dx.doi.org/10.1126/science.abq1158.
  24. Encouraging divergent thinking in large language models through multi-agent debate. CoRR, abs/2305.19118, 2023. doi: 10.48550/arXiv.2305.19118. URL https://doi.org/10.48550/arXiv.2305.19118.
  25. Training socially aligned language models in simulated human society. arxiv preprint, abs/2305.16960, 2023a. doi: 10.48550/arXiv.2305.16960. URL https://doi.org/10.48550/arXiv.2305.16960.
  26. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  61–68. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-short.8. URL https://doi.org/10.18653/v1/2022.acl-short.8.
  27. BOLAA: benchmarking and orchestrating llm-augmented autonomous agents. CoRR, abs/2308.05960, 2023b. doi: 10.48550/arXiv.2308.05960. URL https://doi.org/10.48550/arXiv.2308.05960.
  28. Self-refine: Iterative refinement with self-feedback. arXiv preprint, abs/2303.17651, 2023. doi: 10.48550/arXiv.2303.17651. URL https://doi.org/10.48550/arXiv.2303.17651.
  29. Jack Mezirow. How critical reflection triggers transformative learning. Adult and Continuing Education: Teaching, learning and research, 4:199, 2003. URL https://www.colorado.edu/plc/sites/default/files/attached-files/how_critical_reflection_triggers_transfo.pdf.
  30. Jack Mezirow. Transformative learning theory. In Contemporary theories of learning, pp.  114–128. Routledge, 2018. URL https://www.wichita.edu/services/mrc/OIR/Pedagogy/Theories/transformative.php.
  31. Marvin Minsky. Society of mind. Simon and Schuster, 1988. URL https://www.simonandschuster.com/books/Society-Of-Mind/Marvin-Minsky/9780671657130.
  32. The trouble with overconfidence. Psychological review, 115(2):502, 2008. URL https://healy.econ.ohio-state.edu/papers/Moore_Healy-TroubleWithOverconfidence_WP.pdf.
  33. Iain Munro. The management of circulations: Biopolitical variations after foucault. International Journal of Management Reviews, 14(3):345–362, 2012. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-2370.2011.00320.x.
  34. Diana C Mutz. Hearing the other side: Deliberative versus participatory democracy. Cambridge University Press, 2006. URL https://www.cambridge.org/core/books/hearing-the-other-side/7CB061238546313D287668FF8EFE2EF7.
  35. OpenAI. Chatgpt: Optimizing language models for dialogue, 2022. https://openai.com/blog/chatgpt/.
  36. Training language models to follow instructions with human feedback, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html.
  37. Generative agents: Interactive simulacra of human behavior. CoRR, abs/2304.03442, 2023. doi: 10.48550/arXiv.2304.03442. URL https://doi.org/10.48550/arXiv.2304.03442.
  38. Chaim Perelman. The new rhetoric. Springer, 1971. URL https://link.springer.com/chapter/10.1007/978-94-010-1713-8_8.
  39. Karl Raimund Popper. The myth of the framework: In defence of science and rationality. Psychology Press, 1994. URL http://www.math.chalmers.se/~ulfp/Review/framework.pdf.
  40. A survey of hallucination in large foundation models. CoRR, abs/2309.05922, 2023. doi: 10.48550/arXiv.2309.05922. URL https://doi.org/10.48550/arXiv.2309.05922.
  41. Neural theory-of-mind? on the limits of social intelligence in large lms. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  3762–3780. Association for Computational Linguistics, 2022. URL https://aclanthology.org/2022.emnlp-main.248.
  42. Donald A Schon. The reflective practitioner: How professionals think in action, volume 5126. Basic books, 1984. URL https://www.taylorfrancis.com/books/mono/10.4324/9781315237473/reflective-practitioner-donald-sch%C3%B6n.
  43. Clever hans or neural theory of mind? stress testing social reasoning in large language models. arXiv preprint, abs/2305.14763, 2023. doi: 10.48550/arXiv.2305.14763. URL https://doi.org/10.48550/arXiv.2305.14763.
  44. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint, abs/2303.11366, 2023. doi: 10.48550/arXiv.2303.11366. URL https://doi.org/10.48550/arXiv.2303.11366.
  45. Push Singh. Examining the society of mind. Computing and Informatics, 22(6):521–543, 2003. URL https://www.jfsowa.com/ikl/Singh03.htm.
  46. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint, abs/2206.04615, 2022. doi: 10.48550/arXiv.2206.04615. URL https://doi.org/10.48550/arXiv.2206.04615.
  47. Multiagent systems: A survey from a machine learning perspective. Auton. Robots, 8(3):345–383, 2000. doi: 10.1023/A:1008942012299. URL https://doi.org/10.1023/A:1008942012299.
  48. Cass R Sunstein. Why societies need dissent. In Why Societies Need Dissent. Harvard University Press, 2005. URL https://www.hup.harvard.edu/catalog.php?isbn=9780674017689&content=bios.
  49. James Surowiecki. The Wisdom of Crowds. Anchor, 2005. ISBN 0385721706.
  50. Henri Tajfel. Social psychology of intergroup relations. Annual review of psychology, 33(1):1–39, 1982. URL https://www.annualreviews.org/doi/abs/10.1146/annurev.ps.33.020182.000245?journalCode=psych.
  51. The social identity theory of intergroup behavior. In Political psychology, pp.  276–293. Psychology Press, 2004. URL https://psycnet.apa.org/record/2004-13697-016.
  52. José M. Vidal. Fundamentals of Multiagent Systems: Using NetLogo Models. Unpublished, 2006. URL http://www.multiagent.com/fmas. http://www.multiagent.com.
  53. A survey on large language model based autonomous agents. CoRR, abs/2308.11432, 2023a. URL https://doi.org/10.48550/arXiv.2308.11432.
  54. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023b. URL https://openreview.net/pdf?id=1PL1NIMMrw.
  55. Interactive natural language processing. CoRR, abs/2305.13246, 2023c. doi: 10.48550/arXiv.2305.13246. URL https://doi.org/10.48550/arXiv.2305.13246.
  56. Gerhard Weiß. Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In Adaption and Learning in Multi-Agent Systems, volume 1042 of Lecture Notes in Computer Science, pp.  1–21. Springer, 1995. doi: 10.1007/3-540-60923-7_16. URL https://doi.org/10.1007/3-540-60923-7_16.
  57. Michael J. Wooldridge. An Introduction to MultiAgent Systems, Second Edition. Wiley, 2009. URL https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/imas/IMAS2e.html.
  58. Evidence for a collective intelligence factor in the performance of human groups. Science, 330(6004):686–688, 2010. doi: 10.1126/science.1193147. URL https://www.science.org/doi/abs/10.1126/science.1193147.
  59. The rise and potential of large language model based agents: A survey. arxiv preprint, abs/2309.07864, 2023. URL https://doi.org/10.48550/arXiv.2309.07864.
  60. A survey on multimodal large language models. arXiv preprint, abs/2306.13549, 2023. doi: 10.48550/arXiv.2306.13549. URL https://doi.org/10.48550/arXiv.2306.13549.
  61. A survey of large language models. arXiv preprint, abs/2303.18223, 2023. doi: 10.48550/arXiv.2303.18223. URL https://doi.org/10.48550/arXiv.2303.18223.
  62. Agents: An open-source framework for autonomous language agents. CoRR, abs/2309.07870, 2023. doi: 10.48550/arXiv.2309.07870. URL https://doi.org/10.48550/arXiv.2309.07870.
  63. Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. CoRR, abs/2305.13168, 2023. doi: 10.48550/arXiv.2305.13168. URL https://doi.org/10.48550/arXiv.2305.13168.
  64. Mindstorms in natural language-based societies of mind. CoRR, abs/2305.17066, 2023. doi: 10.48550/arXiv.2305.17066. URL https://doi.org/10.48550/arXiv.2305.17066.
Citations (79)

Summary

  • The paper presents a novel framework where LLM agents employ structured debate and reflection rounds to simulate human-like collaboration dynamics.
  • It systematically evaluates multi-agent societies across varied tasks, demonstrating that collaboration strategy and trait composition critically drive performance and consensus.
  • The study reveals that optimal coordination, rather than sheer agent count, yields cost-effective improvements in complex reasoning tasks.

Socially-Inspired Collaboration Mechanisms for LLM Agents: A Technical Analysis

Introduction

"Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View" (2310.02124) presents a systematic investigation into the collaborative behaviors of LLM agents in multi-agent settings, drawing direct inspiration from social psychology principles and the Society of Mind (SoM) paradigm. By instantiating artificial societies composed of LLM agents characterized by distinct traits and cognitive strategies, the work assesses whether these agents can not only solve tasks more effectively via collaboration, but also manifest robustly human-like social phenomena such as conformity and consensus seeking.

Framework for Simulating Machine Societies

The authors design simulations involving multi-agent societies in which each agent is endowed with an individual trait—either easy-going or overconfident—and a cognitive or “thinking pattern”: debate (multi-agent mutual critique) or reflection (individual self-examination). Figure 1

Figure 1: The society simulation pipeline with agents reflecting personality traits and cycles of debate or reflection.

Agents operate over multiple rounds, communicating in accordance with predefined collaborative strategies—specified as permutations of debate and reflection across rounds—to jointly complete complex reasoning tasks. The societies vary in composition:

  • S₁: All overconfident,
  • S₂: Majority overconfident, one easy-going,
  • S₃: Majority easy-going, one overconfident,
  • S₄: All easy-going.

The simulation framework is evaluated across three datasets: MMLU (multidomain multiple choice), MATH (competition-level problem solving), and Chess Move Validity (synthetic state tracking; see Figure 2). Figure 2

Figure 2: The chess move validity task, testing collaborative inference and consensus.

Experimental Evaluation: Strategies, Composition, and Task Dependency

The quantitative evaluation systematically explores three axes:

  1. Collaborative strategy (ordering and mixture of debate and reflection rounds)
  2. Societal composition (distribution of individual agent traits)
  3. Task and domain impact (difficulty and semantic diversity)

Strong numerical results indicate that, contrary to the common assumption that scaling agent count alone ensures better performance, the composition and coordination strategy are decisive factors.

  • Societal composition (easy-going vs. overconfident) does not consistently impact accuracy, but is highly predictive of the tendency to reach consensus (Figure 3).
  • Collaborative strategies starting with or dominated by debate rounds consistently outperform others, especially on high-difficulty reasoning tasks (Figure 4).
  • Merely increasing agent number or rounds of discussion does not yield monotonic performance gains; three agents and three rounds are maximally cost-effective (Figure 5, Figure 6). Figure 5

    Figure 5: Non-monotonic effect of agent count on accuracy and consensus in collaborative chess move validity.

    Figure 6

    Figure 6: Diminishing returns and plateauing of accuracy as collaboration rounds increase in complex reasoning tasks.

Task-specific effects are documented. For instance, less structured tasks (e.g., Chess Move Validity) exhibit less pronounced gains from advanced collaboration strategies, while open-domain or high-difficulty tasks (e.g., MATH Level 5) benefit substantially from collaborative debate-reflection permutations.

Analysis of Social Behaviors: Conformity and Consensus in LLM Societies

The study provides in-depth behavioral analysis, revealing robust conformity effects and consensus formation dynamics akin to those modeled in classical human social psychology.

  • Easy-going societies converge toward consensus more readily than overconfident societies or mixed compositions (Figure 7, Figure 3).
  • Agents frequently shift their answers in response to majority views over multiple rounds (conformity), with both beneficial and detrimental effects, paralleling phenomena like “groupthink” and its pathologies (Figure 8, Figure 9).
  • The proportion of beneficial conformity (False→True synchronization) versus detrimental (True→False) depends on both agent trait mixture and collaboration strategy (Figure 8). Figure 8

    Figure 8: Distribution of answer correctness changes due to conformity, stratified by collaboration round.

    Figure 3

    Figure 3: Consensus cluster analysis reveals that debate strategies more efficiently drive societies toward single-answer convergence compared to reflection.

Qualitative Manifestations and Dialog Trajectory

The paper supplements quantitative findings with dialogic trace analysis and word cloud visualizations:

  • Overconfident traits are attenuated in group contexts—the language becomes more accommodating as collaboration proceeds (Figure 10).
  • Detailed dialog cases demonstrate how group negotiation leads to answer correction or persistence of systematic errors, underpinned by the interaction of agent traits and strategy (Figure 11, Figure 12). Figure 10

    Figure 10: Word cloud analysis showing lexical convergence toward group norms in easy-going vs. overconfident societies.

    Figure 11

    Figure 11: Dialog trace on chess move prediction, illustrating how a single dissenting easy-going agent can be gradually convinced by overconfident peers.

Theoretical and Practical Implications

Theoretical Insights

The results challenge the assumption that larger groups are universally more capable, highlighting the primacy of strategic cognitive alignment and controlled diversity over raw scale. The explicit mapping from agent psychology to group-level behavior offers a credible pathway to simulating collective intelligence phenomena within LLM agent networks. The findings also concretely demonstrate the involuntary emergence of conformity, casting new light on alignment and robustness challenges in human-AI and multi-AI collaborative settings.

Practical Implications and Future Directions

  • Collaborative strategies that emphasize early, repeated debate and end with consensus reflection achieve the best cost-accuracy tradeoff.
  • Engineering multi-agent LLM systems requires not just attention to agent count, but careful orchestration of trait diversity and communication structure.
  • These insights are directly transferable to the design of socially-aware AI, group-aided scientific discovery, intelligent tutoring, and collective decision-support systems.
  • Future research is warranted on autonomous strategy selection, agent adaptation under dynamic trait and task perturbations, and on exploring collaboration among agents instantiated by distinct LLM architectures.

Conclusion

The paper rigorously demonstrates that effective multi-agent collaboration among LLM agents benefits far less from brute scale than from socially-inspired composition and interaction mechanisms. Permutational debate and reflection, mapped to foundational social psychology constructs, produce both higher accuracy and richer emergent behavior than naive voting or prolonged self-reflection. These results lend both empirical and theoretical support to the principles of the Society of Mind, and set a technical agenda for future exploration of artificial collective intelligence in language agent systems.


Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.