Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation (2309.17234v2)

Published 29 Sep 2023 in cs.CL, cs.CY, and cs.LG

Abstract: There is an growing interest in using LLMs in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.

PDF HTML Abstract

Overview of LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games

The paper "LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games" presents a novel evaluation framework designed to comprehensively assess the reasoning and decision-making abilities of LLMs in complex negotiation scenarios. Addressing a gap in existing benchmarks, the authors introduce interactive multi-agent negotiation games as a platform to evaluate the performance of LLMs in tasks requiring negotiation, compromise, and adaptative strategies.

Key Contributions

The primary contribution of this work is the formulation of negotiation games as an evaluation benchmark for LLMs. These games are multi-issue, text-based, semantically rich, and allow tunable difficulty to cater to varying levels of LLM capability. Key characteristics of these games necessitate that agents involved in the negotiation demonstrate proficiency in arithmetic, inference, exploration, and planning.

The authors employ zero-shot Chain-of-Thought (CoT) prompting techniques to facilitate reasoning processes in LLMs, specifically utilizing models like GPT-4. Performance metrics reveal a substantial performance gap between newer and older models, with GPT-4 exhibiting advanced negotiation skills compared to its predecessors.

Evaluation and Results

The researchers developed a systematic evaluation method using diverse metrics, such as the success rate of achieving feasible agreements and the ability to generalize across different negotiation scenarios. Notably, they observe that GPT-4 consistently outperforms earlier architectures from GPT iterations, indicating significant enhancements in negotiation-related faculties.

A major facet of the research involves assessing generalization capabilities, ensuring that the LLMs can adapt to novel games and configurations beyond their initial training data. Additionally, the framework examines critical interaction dynamics, particularly in the presence of adversarial or greedy agents within the negotiation process.

Implications and Future Directions

The implications of this research straddle theoretical and practical domains. Theoretically, this framework provides insights into the social intelligence and strategic reasoning capacities of LLMs. Practically, understanding these capabilities has ramifications for deploying LLMs as autonomous negotiation agents in real-world applications, such as customer service and collaborative interfaces.

One of the bold claims made in the paper is that complex games, such as those introduced, can serve as a robust benchmark for future developments in AI negotiation frameworks. This anticipates newer, more capable LLMs which must be continually tested against evolving standards and challenges.

The paper also opens prospects for future research, such as integrating more advanced frameworks that incorporate long-term strategic planning and adaptive interaction strategies beyond the current CoT prompting methods. Furthermore, it suggests the potential of developing defenses and detection mechanisms against adversarial manipulations, a critical concern as LLMs become more prevalent as interactive agents.

Conclusion

In sum, the authors of "LLM-Deliberation" advance an innovative assessment framework for evaluating the decision-making and adaptive reasoning abilities of LLMs through complex multi-agent negotiation games. By thoroughly investigating the performance and limitations of LLMs like GPT-4 within this context, the paper provides valuable insights into their potential applications and necessary future developments in LLM-based systems. The work further paves the way for enhancing the robustness and reliability of AI-driven negotiation tools in dynamic, real-world environments.