- The paper introduces a novel AMR-based methodology to evaluate Cicero’s dual performance in strategic play and communicative negotiation.
- The study finds that while Cicero wins 84% of games, its persuasion and deception tactics are easily detected by human opponents.
- The research reveals a gap between Cicero’s superior strategy and its subpar human-like communication, guiding future AI enhancements in diplomacy.
Assessing Cicero’s Diplomacy Capabilities in a Communicative Context
This paper presents an evaluation of the AI system Cicero, known for playing the strategy board game Diplomacy. It focuses on assessing the extent of Cicero’s proficiency not just in strategic gameplay but also in its communicative abilities. The paper confronts popular assertions proclaiming Cicero's capabilities in human-like negotiation and deception within the game, as reported in prior studies and media narratives.
Strategic Versus Communicative Skills
Previous evaluations of Cicero concentrated predominantly on its strategic success, namely its ability to win games. However, Diplomacy, as a test bed, requires effective communication, which encompasses the dual facets of persuasion and deception. Mastery of these skills is considered integral for an AI to genuinely compete at a human level within the game. This paper introduces novel methodologies to rigorously test these communicative aspects.
The methodology involved annotating in-game communications using Abstract Meaning Representation (AMR) to decode the expressed intents apart from mere strategic actions. The authors have build a parser that manages to achieve a Smatch score of 66.6 after adjusting for specific game nuances, nonetheless acknowledging that there remains room for improvement in parsing accuracy. The distinction between intent and action enabled the researchers to flag deceit (broken commitments) and persuasion attempts through AMR coded interactions.
Key Findings
The paper conducted 24 games involving human players and Cicero, resulting in significant empirical insights:
- Strategic Dominance: Cicero continues to demonstrate superior strategic capabilities relative to human players, securing victories in 84% of games.
- Communication Attempts: Despite its frequent success in gameplay, Cicero's communication is readily identified by experienced human players, which suggests that it doesn’t convincingly simulate human-like interactions.
- Deception and Persuasion Limitations: Humans perceived Cicero’s communications to be deceptive at a higher rate than utterances from other human players. However, when analyzed through detected broken commitments, humans themselves were found to break commitments more frequently than Cicero.
- Effectiveness in Persuasion: Despite initiating persuasive attempts at similar rates, humans achieved a higher success rate in persuasion over Cicero, showing a gap in the AI’s ability to influence strategically via dialogue.
These results challenge the notion of Cicero as a "master" of human-like communicative interaction in Diplomacy.
Implications and Future Outlook
The paper highlights Cicero's reliance on strategic execution rather than communicative finesse to win games, demonstrating that while it excels tactically, it achieves human-level competence neither in deception nor in persuading its opponents through dialogue. Advancements in these areas remain crucial for integrating AI that effectively mirrors human strategic communication in competitive settings.
The authors suggest possible future directions to refine AI communicative methodologies in Diplomacy-like scenarios. They advocate for more nuanced analysis and improved models that not only understand tactical and strategic needs but also the subtleties of human collaboration, misdirection, and persuasion. This exploration into AI behavior in games like Diplomacy serves as a stepping-stone toward developing intelligent systems that integrate strategic decision-making with complex human-like communication and interaction models. Overall, the research opens a dialogue for advancing AI's communicative competencies beyond tactical automation.