Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation (2309.17234v2)

Published 29 Sep 2023 in cs.CL, cs.CY, and cs.LG

Abstract: There is an growing interest in using LLMs in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.

Overview of LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games

The paper "LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games" presents a novel evaluation framework designed to comprehensively assess the reasoning and decision-making abilities of LLMs in complex negotiation scenarios. Addressing a gap in existing benchmarks, the authors introduce interactive multi-agent negotiation games as a platform to evaluate the performance of LLMs in tasks requiring negotiation, compromise, and adaptative strategies.

Key Contributions

The primary contribution of this work is the formulation of negotiation games as an evaluation benchmark for LLMs. These games are multi-issue, text-based, semantically rich, and allow tunable difficulty to cater to varying levels of LLM capability. Key characteristics of these games necessitate that agents involved in the negotiation demonstrate proficiency in arithmetic, inference, exploration, and planning.

The authors employ zero-shot Chain-of-Thought (CoT) prompting techniques to facilitate reasoning processes in LLMs, specifically utilizing models like GPT-4. Performance metrics reveal a substantial performance gap between newer and older models, with GPT-4 exhibiting advanced negotiation skills compared to its predecessors.

Evaluation and Results

The researchers developed a systematic evaluation method using diverse metrics, such as the success rate of achieving feasible agreements and the ability to generalize across different negotiation scenarios. Notably, they observe that GPT-4 consistently outperforms earlier architectures from GPT iterations, indicating significant enhancements in negotiation-related faculties.

A major facet of the research involves assessing generalization capabilities, ensuring that the LLMs can adapt to novel games and configurations beyond their initial training data. Additionally, the framework examines critical interaction dynamics, particularly in the presence of adversarial or greedy agents within the negotiation process.

Implications and Future Directions

The implications of this research straddle theoretical and practical domains. Theoretically, this framework provides insights into the social intelligence and strategic reasoning capacities of LLMs. Practically, understanding these capabilities has ramifications for deploying LLMs as autonomous negotiation agents in real-world applications, such as customer service and collaborative interfaces.

One of the bold claims made in the paper is that complex games, such as those introduced, can serve as a robust benchmark for future developments in AI negotiation frameworks. This anticipates newer, more capable LLMs which must be continually tested against evolving standards and challenges.

The paper also opens prospects for future research, such as integrating more advanced frameworks that incorporate long-term strategic planning and adaptive interaction strategies beyond the current CoT prompting methods. Furthermore, it suggests the potential of developing defenses and detection mechanisms against adversarial manipulations, a critical concern as LLMs become more prevalent as interactive agents.

Conclusion

In sum, the authors of "LLM-Deliberation" advance an innovative assessment framework for evaluating the decision-making and adaptive reasoning abilities of LLMs through complex multi-agent negotiation games. By thoroughly investigating the performance and limitations of LLMs like GPT-4 within this context, the paper provides valuable insights into their potential applications and necessary future developments in LLM-based systems. The work further paves the way for enhancing the robustness and reliability of AI-driven negotiation tools in dynamic, real-world environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Playing repeated games with large language models. arXiv, 2023.
  2. Jacob Andreas. Language models as agent models. In Findings of EMNLP, 2022.
  3. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  4. Language models are few-shot learners. In NeurIPS, 2020.
  5. Google Duplex. A.i. assistant calls local businesses to make appointments. [Link].
  6. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv, 2023.
  7. Understanding social reasoning in language models with language models. arXiv, 2023a.
  8. Strategic reasoning with language models. arXiv, 2023b.
  9. Reasoning with language model is planning with world model. arXiv, 2023.
  10. HBR. How walmart automated supplier negotiations. [Link].
  11. What would jiminy cricket do? towards agents that behave morally. NeurIPS, 2022.
  12. Icertis. Negotiate better outcomes and reduce risk across high-volume enterprise contracts with ai-powered insights. [Link].
  13. Negotiation and honesty in artificial intelligence methods for the board game of diplomacy. Nature Communications, 13(1):7214, 2022.
  14. Passive learning of active causal strategies in agents and language models. arXiv, 2023.
  15. Agentbench: Evaluating llms as agents. arXiv, 2023.
  16. LSB. Article: Negotiation planning. [Link].
  17. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv, 2023a.
  18. Are emergent abilities in large language models just in-context learning? arXiv, 2023b.
  19. Luminance. Luminance announces ai-powered chatbot in latest application of its legal-grade large language model. [Link].
  20. Microsoft. Building the new bing. [Link], 2023a.
  21. Microsoft. Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web. [Link], 2023b.
  22. Microsoft. Introducing microsoft 365 copilot – your copilot for work. [Link], 2023c.
  23. OpenAI. Chatgpt plugins. [Link], 2023a.
  24. OpenAI. Gpt-4 technical report. arXiv, 2023b.
  25. Pactum. Autonomous negotiations for companies with revenue over $5 billion. [Link].
  26. Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark. In ICML, 2023.
  27. No-press diplomacy: Modeling multi-agent gameplay. NeurIPS, 2019.
  28. Generative agents: Interactive simulacra of human behavior. arXiv, 2023.
  29. Gorilla: Large language model connected with massive apis. arXiv, 2023.
  30. Social iqa: Commonsense reasoning about social interactions. In EMNLP-IJCNLP, 2019.
  31. Neural theory-of-mind? on the limits of social intelligence in large lms. In EMNLP, 2022.
  32. Toolformer: Language models can teach themselves to use tools. arXiv, 2023.
  33. Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv, 2023.
  34. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023.
  35. Lawrence E Susskind. Scorable games: A better way to teach negotiation. Negot. J., 1:205, 1985.
  36. Using simulations to teach negotiation: Pedagogical theory and practice. Teaching negotiation: Ideas and innovations, pp.  285–310, 2000.
  37. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In ACL: HLT, 2019.
  38. Tomer Ullman. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv, 2023.
  39. Generating role-playing game quests with gpt language models. IEEE Transactions on Games, 2022.
  40. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022.
  41. Tree of thoughts: Deliberate problem solving with large language models. arXiv, 2023a.
  42. React: Synergizing reasoning and acting in language models. In ICLR, 2023b.
  43. I cast detect thoughts: Learning to converse and guide with intents and theory-of-mind in dungeons and dragons. In ACL, 2023.
  44. Universal and transferable adversarial attacks on aligned language models. arXiv, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sahar Abdelnabi (21 papers)
  2. Amr Gomaa (18 papers)
  3. Sarath Sivaprasad (8 papers)
  4. Lea Schönherr (23 papers)
  5. Mario Fritz (160 papers)
Citations (18)
Github Logo Streamline Icon: https://streamlinehq.com