I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy (2410.07109v2)

Published 9 Oct 2024 in cs.CL, cs.AI, cs.CY, and cs.MA

Abstract: As LLM-based agents become increasingly autonomous and will more freely interact with each other, studying interactions between them becomes crucial to anticipate emergent phenomena and potential risks. Drawing inspiration from the widely popular Stanford Prison Experiment, we contribute to this line of research by studying interaction patterns of LLM agents in a context characterized by strict social hierarchy. We do so by specifically studying two types of phenomena: persuasion and anti-social behavior in simulated scenarios involving a guard and a prisoner agent who seeks to achieve a specific goal (i.e., obtaining additional yard time or escape from prison). Leveraging 200 experimental scenarios for a total of 2,000 machine-machine conversations across five different popular LLMs, we provide a set of noteworthy findings. We first document how some models consistently fail in carrying out a conversation in our multi-agent setup where power dynamics are at play. Then, for the models that were able to engage in successful interactions, we empirically show how the goal that an agent is set to achieve impacts primarily its persuasiveness, while having a negligible effect with respect to the agent's anti-social behavior. Third, we highlight how agents' personas, and particularly the guard's personality, drive both the likelihood of successful persuasion from the prisoner and the emergence of anti-social behaviors. Fourth, we show that even without explicitly prompting for specific personalities, anti-social behavior emerges by simply assigning agents' roles. These results bear implications for the development of interactive LLM agents as well as the debate on their societal impact.

PDF HTML Abstract

Analysis of LLM Anti-Social Behavior and Persuasion in Hierarchical Multi-Agent Systems

The paper, "I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy," investigates the dynamics of LLM-based agents within a structured social hierarchy. Inspired by the Stanford Prison Experiment (SPE), the research examines the interaction patterns of these agents, specifically focusing on persuasion and anti-social behavior.

Research Context and Setup

In the context of increasing autonomy for LLMs, understanding the interactions among these agents becomes critical, especially in scenarios where power dynamics are central. The paper models interactions between a guard and a prisoner agent across 200 experimental scenarios, employing five popular LLMs. The experiment culminates in 2,000 agent conversations, probing how goals, agent personas, and contextual factors drive persuasion and anti-social behavior.

Key Findings

Persona Dynamics and Persuasion: The paper finds variance in the ability of LLMs to engage effectively in conversations involving power dynamics. Notably, persuasion success is highly influenced by the nature of the goal; agents aiming for smaller concessions, such as yard time, achieve higher success rates compared to those attempting to escape. This aligns with the hypothesis that less onerous requests are more likely to succeed. The findings suggest that the guard's personality is integral to the success of persuasion efforts.
Anti-Social Behavior Emergence: Anti-social behavior manifests regardless of explicit personality instructions. The guard's persona, particularly when abusive, significantly influences the conversation's toxicity. Interestingly, rebellious prisoners do not markedly impact these dynamics. The experiments also reveal that anti-social behavior is not contingent upon the goals pursued, suggesting that the hierarchical interaction setting inherently fosters such behaviors.
LLM Differences: The paper observed notable differences among the LLMs, with only three out of the five models—Llama3, Command-r, and Orca2—successfully facilitating conversations without significant breakdowns. This highlights the varying capacity of LLMs to maintain coherence and adhere to assigned roles in multi-agent contexts.
Temporal Trends in Behavior: The research documents two primary behavioral trajectories in conversations: consistent anti-social behavior throughout or initial peaks in toxicity followed by decline. Moreover, there is no demonstrable predictive relationship (via Granger causality) between the anti-social behavior of one agent prompting reactions in the other.

Implications for AI Development

The findings underscore the challenges of deploying LLMs in environments mirroring human social hierarchies, particularly where power imbalances exist. The manifestation of anti-social tendencies without explicit prompts necessitates a reconsideration of how LLMs are conditioned for role-specific interactions.

From a theoretical standpoint, this research contributes to understanding the socio-cognitive propensities encoded in machine learning models, which could parallel human behavior under certain simulated conditions. Practically, the paper advocates for implementing more robust safeguards and ethical guidelines in the development of LLMs to mitigate unintended outcomes in hierarchical settings.

Future Directions

The exploration of multi-agent dynamics presented in this paper suggests several avenues for future inquiry. Expanding beyond binary agent setups to include more complex, interconnected systems could yield richer insights into emergent behaviors in AI collectives. Additionally, focusing on refining LLM architectures to enhance coherence and adaptability in hierarchical scenarios could bolster their application in real-world contexts.

In summary, this paper provides a detailed examination of how LLMs navigate social hierarchies, offering critical insights into their potential and limitations in simulating intricate human-like interactions. The research paves the way for further interdisciplinary studies that intersect AI, psychology, and social dynamics.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Gian Maria Campedelli (12 papers)
Nicolò Penzo (4 papers)
Massimo Stefan (1 paper)
Roberto Dessì (12 papers)
Marco Guerini (40 papers)
Bruno Lepri (120 papers)
Jacopo Staiano (38 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/omarsar0/status/1844427182141211054

https://twitter.com/CampedelliGian/status/1844348444661723548

https://twitter.com/m_guerini/status/1844351959358374374

https://twitter.com/penzo_nicolo/status/1844408325926158758

https://twitter.com/MobS_FBK/status/1844725098487419248

https://twitter.com/penzo_nicolo/status/1879836507705684394