LLM-Based Societies & Emergent Dynamics

Updated 19 November 2025

LLM-based societies are computational systems where advanced LLM agents simulate human social behavior with persistent identity, memory, and coordinated roles.
Rigorous PIMMUR principles and large-scale simulations enable precise evaluation of cooperation, legal structures, and political persuasion within these systems.
Insights from LLM-based societies inform decentralized governance, ethical regulation, and value alignment in complex, emergent multi-agent environments.

A society based on LLMs—an LLM-based society—is any computational system in which a population of LLM-driven agents engages in social, economic, legal, or collaborative processes that collectively exhibit emergent or structured social phenomena. These systems span a spectrum from multi-agent simulations in computational social science to decentralized service ecosystems, to practical deployments in domains such as law, engineering, or democratic persuasion. LLM-based societies are characterized by agents with persistent identity, memory, communication abilities, internalized roles or values, and mechanisms for coordination or conflict, which together instantiate collective behaviors analogous to those of human social systems.

1. Foundational Principles and Methodological Standards

Rigorous simulation and study of LLM-based societies require strict methodological guarantees to ensure validity and generalizability. The PIMMUR principles have emerged as the canonical requirements for credible multi-agent LLM research (Zhou et al., 22 Sep 2025):

Profile: Agents must be heterogeneous; each is sampled from a non-degenerate distribution over psychological or demographic traits (e.g., Dirichlet, multivariate Gaussian).
Interaction: Agents exchange information via explicit interaction graphs, enforcing turn-taking and direct communication; "statistical" or aggregated pseudo-interactions are insufficient.
Memory: Each agent maintains evolving private memory (e.g., $\mathbf{M}_i$ updated by rule), capturing history and supporting belief formation.
Minimal-Control: Prompts must use minimal steering; all intentional outcome- or bias-inducing language must be quantified and kept below strict thresholds.
Unawareness: Agents must not infer the experimental hypothesis, structure, or evaluation metric.
Realism: Validation of simulation behaviors must rely on empirical data, not just stylized mathematical models; e.g., degree distributions, resource allocations, or action sequences are compared via $D_{KL}$ , RMSE, or other standard divergence metrics.

Failure to enforce these principles can lead to spurious emergence of phenomena—a problem extensively documented by re-analyses showing that 4 of 5 "classic" LLM-society effect claims do not survive under strict PIMMUR conditions (Zhou et al., 22 Sep 2025).

LLM-based societies robustly instantiate both classical and novel dynamics in the evolution of cooperation, punishment, and social contracts.

Diner's Dilemma and Boyd–Richerson Models: LLM agent simulations of n-player social dilemmas replicate the emergence of cooperation and the critical role of explicit punishment (Warnakulasuriya et al., 28 Apr 2025). Agents prompted to enact roles such as "Moralist" or "Cooperator-Punisher" only establish high-cooperation regimes (and overtake defectors) when punishment is both available and the fine-to-cost ratio is sufficiently high (e.g., $p:k = 6:1$ ). Strategy updating proceeds via pairwise imitation using the Fermi function:

$P(s_i \to s_j) = \frac{1}{1 + \exp\left[-\beta(\pi_j - \pi_i)\right]},$

where $\pi_i$ and $\pi_j$ are cumulative utilities. Without punishment or with weak punishment, defection predominates.

Iterated Donor Game and Indirect Reciprocity: Cultural evolution in LLM societies mirrors human-like indirect reciprocity when agents can observe the recent past behavior of others (Vallinder et al., 2024). Base model differences are pronounced: Claude 3.5 Sonnet agents evolve donation rates rising from 50% to 77% over 10 generations, while GPT-4o societies see declining cooperation (< 15% donation). Costly punishment can amplify or destroy cooperative equilibria depending on the base model's social reasoning capacity.
Hobbesian Social Contract and Artificial Leviathan: Simulation of Hobbesian social contract theory shows that LLM agents, endowed with psychological drives (aggressiveness, covetousness, strength), transition from a conflictual "state of nature" to a peaceful commonwealth with an absolute sovereign once a critical authority threshold is crossed (Dai et al., 2024). This is operationalized with

$A_{ij}(t) = 1 \Leftrightarrow j\ \text{has conceded to}\ i,$

and the emergence of an agent $k$ such that $A_{k,i}=1\ \forall i\neq k$ terminates unrestrained conflict.

Moral Evolution Through Expanding Circle: Agent-based simulations where LLMs are prompted with varying moral radii (selfish → kin-focused → reciprocal → universal) demonstrate that kin-focused agents dominate under moderate coordination costs, aligning with Hamilton's kin selection theory and Singer's expanding circle hypothesis (Ziheng et al., 22 Sep 2025). Universal altruists are often outcompeted due to their inability to effectively punish exploiters.

3. Persuasion, Deliberation, and Political Implications

The persuasive capabilities of LLM societies in political domains are quantitatively benchmarked. Structured experiments with over 10,000 participants show that, under forced exposure, LLM chatbots (Claude 3.5/3.7) are as persuasive as human-generated campaign material both in immediate and five-week outcomes, with average treatment effects (ATE) on policy indices indistinguishable from standard ads (e.g., immediate ATE: $\Delta = 0.363$ for chatbot vs. $0.349$ for human) (Chen et al., 29 Apr 2025).

Cost analysis demonstrates that LLM-based persuasion can achieve cost-per-persuaded-voter metrics of \$48–\$74, outperforming traditional campaign ads at \$100 per vote. However, the exposure constraint ( $P(E_i=1)$ ) imposes a substantial bottleneck on real-world scale; voluntary engagement with political chatbots remains low, so the overall persuasive potential is limited until better "pull" mechanisms can be institutionalized.

Recommendations for democratic resilience include mandatory content provenance disclosure, ad-platform policy updates to include LLM-driven persuasion, and active funding for research into counter-messaging and detection of AI-generated persuasion at scale.

4. Legal, Governance, and Institutional Simulation

LLM-based societies offer a basis for simulating legal regimes, rule evolution, and institutional adaptation:

NomicLaw Collaborative Law-Making: In structured, multi-agent legal vignettes, LLM agents propose, justify, and vote on regulatory rules. Homogeneous LLM groups quickly solidify self-supporting voting blocs (self-voting rates up to 87%), while heterogeneous mixes of models induce dynamic coalition reshuffling (coalition switch rates ≈ 39%) and more context-sensitive legal rhetoric (Hota et al., 7 Aug 2025). Trust and reciprocity are measured by bilateral trust matrices

$T_{i \rightarrow j} = \frac{\text{votes cast by }i\text{ for }j}{\text{total proposals by }j}$

and a reciprocity index quantifying the symmetry of mutual support.

Law in Silico Macro and Micro-Societies: Institutional agent frameworks instantiate legislature, judiciary, and enforcement roles atop populations of LLM-driven citizens with demographically and socially grounded profiles (Wang et al., 28 Oct 2025). Macro-level simulations—using up to 10,000 agents—reproduce observed real-world crime rates with mean errors $\leq 0.002$ (Qwen2.5), while micro-level games elucidate the impact of transparency, enforcement adaptivity, litigation cost, and corruption probability on exploitation and welfare dynamics. A representative finding is that the iterative legal closure cycles (legislature closes loophole, company shifts tactic) induce persistent regulation chases, and that bias in law perception significantly affects the welfare of vulnerable populations.

Recent frameworks allow LLM-based societies to be simulated at planetary scales:

Light Society: A modular, event-driven agent-based platform achieves simulations with over $\sim 10^9$ human-like LLM agents (Guan et al., 7 Jun 2025). Social processes (trust games, opinion propagation) are formalized as state transitions

$(s_i^{(t)}, e^{(t)}) \xrightarrow{v} (s_i^{(t+1)}, e^{(t+1)})$

orchestrated by perception, decision (policy), agent- and environment-evolution, and update functions, with event queue management providing temporal coherence and scalability. Surrogate models and prompt-caching reduce LLM call counts by up to 90%.

Consensus and Coordination Constraints: Coordination emerges below a model-and-size-dependent threshold determined by the "majority force" parameter $\mu$ in the adoption probability curve:

$P_+(m) = \frac{1}{2}[1 + \tanh(\mu m)],$

where $m$ is mean opinion (Marzo et al., 2024). The critical group size $N_c$ where global consensus becomes unattainable ( $\mu(N_c)=1$ ) grows exponentially with model capability (MMLU score). For top-end LLMs (GPT-4 Turbo), $N_c \gg 1000$ , surpassing informal human group sizes (i.e., Dunbar's number); less capable models fragment at much smaller $N$ . Scaling strategies—hierarchical summary agents, centralized memory, role specialization—are required to exceed these intrinsic coordination bounds.

6. Moral Reasoning, Value Alignment, and Cultural Divergence

LLM-based societies internalize distinctive moral topographies, reflecting both training data artifacts and emergent, system-level dynamics:

Word Association and Moral Foundations: Propagating Moral Foundation Theory seed words through LLM- and human-generated word association graphs reveals that LLMs align tightly with human populations for positively valenced virtues but diverge on vices: LLMs emphasize abstract, systemic constructs ("prejudice," "betrayal"), while humans ground negative moral concepts in emotional, sensory content ("vomit," "disgusting") (Xiang et al., 26 May 2025). Quantitatively, LLMs' moral maps achieve overall Spearman $\rho$ ≈ 0.29 with held-out human annotations, closely paralleling human-association graphs, but systematic gaps persist in emotionality and concreteness.
Societal Impacts: LLM-based societies may underweight the visceral urgency of harm or contamination in critical contexts (e.g., crisis response), an outcome of de-sensitized and abstract moral reasoning. Future deployments require multi-modal calibration and explicit monitoring for abstract-concrete imbalances, especially as deployment extends into non-Western cultural domains.

7. Decentralized Service Ecosystems and Collective Expertise

LLM-societies extend to decentralized, reputation-driven expert marketplaces:

LLM-Net: A blockchain-based network supports an economy of specialized LLM providers who collaborate, debate, and deliver expert services under auditability, reputation accumulation, and economic incentive structures (Chong et al., 13 Jan 2025). Node roles (requester, coordinator, respondent, validator) interact via multi-turn debate, on-chain recordkeeping, and reputation-based respondent selection:

$P(i|C_d) = \frac{\text{sim}(C_d, \text{cap}_i) \cdot \exp(\beta R_i)}{\sum_j \text{sim}(C_d, \text{cap}_j) \cdot \exp(\beta R_j)},$

producing consensus outputs through weighted aggregation and transparent reward flows. Rapid pruning of low-quality nodes and governance via on-chain voting present prototypes for self-governing LLM societies at service scale.

Table: Key Experimental Platforms for LLM-Based Societies

Framework	Population Scale	Domain
Light Society (Guan et al., 7 Jun 2025)	$10^9$ agents	General social processes
Law in Silico (Wang et al., 28 Oct 2025)	$10^4$ macro, $3-10$ micro	Legal/economic simulations
NomicLaw (Hota et al., 7 Aug 2025)	$5-10$ agents	Collaborative lawmaking
LLM-Net (Chong et al., 13 Jan 2025)	100s–1000s models	Decentralized expert services
MetaOpenFOAM (Chen et al., 2024)	4–8 agents/role	Scientific (CFD) collaboration

Each environment provides distinctive methodologies, scaling properties, and testbeds for probing the emergent properties, robustness, and vulnerabilities of LLM-based societies. The field is converging on hybrid approaches—combining procedural rigor (e.g., via PIMMUR), large-scale simulation, institutional modularity, and empirical benchmarking—to chart the limits and societal implications of advanced LLM collectives.