Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? (2406.13605v2)

Published 19 Jun 2024 in cs.CY, cs.AI, cs.GT, and physics.soc-ph

Abstract: The behavior of LLMs as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.

PDF HTML Abstract

Overview of "Nicer Than Humans: How do LLMs Behave in the Prisoner's Dilemma?"

The paper "Nicer Than Humans: How do LLMs Behave in the Prisoner's Dilemma?" by Nicoló Fontana, Francesco Pierri, and Luca Maria Aiello, is a meticulous paper probing the behavior of LLMs when subjected to the Iterated Prisoner's Dilemma (IPD). The core objective of the paper is to evaluate how Llama2, a state-of-the-art LLM, navigates the intricacies of cooperative behavior against opponents with varying levels of hostility. This research provides a comprehensive and systematic approach to understanding the decision-making processes and social norms encoded within LLMs.

Key Contributions

The paper makes several notable contributions:

Methodological Framework: The authors develop a meta-prompting technique to assess the LLM's comprehension of the IPD game's rules and historical prompts. This technique addresses one of the main shortcomings of previous studies, which often assumed that LLMs understood complex game rules without validation.
Simulation Setup and Analysis: The authors performed extensive simulations of IPD games lasting 100 rounds to analyze Llama2's decision-making process. The paper evaluates different memory window sizes to determine optimal memory utilization for game strategy adherence.
Behavioral Insights: The paper measures Llama2's cooperative tendencies along several behavioral dimensions and compares these behaviors with established human strategies in economic game theory.

Methodology

The methodology section of the paper is crucial for replicability and rigor. The authors employed a three-pronged approach:

Meta-prompting Technique: By using prompt comprehension questions, the researchers ensured Llama2's understanding of game mechanics and historical data logging. This approach involved assessing the LLM's accuracy in answering rule-based questions, chronological sequence queries, and cumulative game statistics.
Memory Window Analysis: The experiments determined how different memory window sizes (number of recent rounds considered in decision-making) impacted Llama2’s strategic play. They concluded that a window size of 10 rounds offered the best balance between completeness and practical effectiveness.
Behavioral Profiling and Strategy Analysis: Using dimensions outlined in classic game theory, such as niceness, forgiveness, retaliation, and troublemaking, the authors profiled the LLM’s behavior. They also applied a Strategy Frequency Estimation Method (SFEM) to evaluate the alignment of Llama2's play with known human strategies.

Findings

One of the salient findings is that Llama2 exhibits a higher inclination towards cooperative behavior compared to humans. Notably, Llama2 tends not to initiate defection and adopts a forgiving stance when the opponent's defection rate drops below 30%. The paper identifies a sigmoid relationship between the probability of Llama2's cooperation and the opponent's cooperation level. The transition from predominantly defecting to cooperative behavior happens abruptly when the opponent's cooperation probability exceeds a threshold of 0.6-0.7.

Numerical Insights

The paper’s numerical analysis reveals several strong results:

Initial Cooperation: Llama2 starts by not defecting first, displaying a cooperative intent from the onset of the games.
Forgiveness Threshold: Llama2 shifts towards cooperation sharply when the opponent's defection rate dips below 30%.
Strategy Transition: Analyzing SFEM scores, the authors found that Llama2's behavior transitions from a Grim strategy to an Always strategy as the opponent’s cooperation probability increases.

Implications and Future Directions

The findings have profound implications for the deployment of LLMs in socially interactive contexts:

Behavioral Consistency: The observed higher propensity for cooperation suggests that LLMs like Llama2 are aligned with norms of cooperative human behavior, at least in experimentally controlled environments.
Auditing and Alignment: The paper's methodology contributes to the broader field of LLM auditing and alignment, providing tools to systematically evaluate how these models adhere to desired behavioral norms and values.
Emergent Social Dynamics: By expanding the range of opponents and scenarios, future research could further explore how LLMs handle more complex and sophisticated social interactions.

Conclusion

Overall, this paper is a significant step towards understanding the social behaviors encoded within LLMs. The systematic approach and rigorous experimental design set a high standard for future research in this domain. As LLMs become increasingly integrated into daily technological applications, such studies are indispensable for ensuring that these models operate within acceptable social and ethical parameters. The methods and findings presented can serve as a benchmark for future investigations and applications in AI-driven social simulations.