Papers
Topics
Authors
Recent
2000 character limit reached

FAIRGAME Framework for LLM Strategic Analysis

Updated 15 December 2025
  • FAIRGAME framework is a modular infrastructure that simulates, audits, and analyzes LLM agents’ strategic behaviors using game theory and repeated games.
  • It quantifies cooperation, competition, and bias through controlled experiments across different languages and personality settings.
  • The system integrates dynamic prompt configuration, statistical analysis, and strategy recognition to support robust AI governance research.

The FAIRGAME framework is a methodological and software infrastructure for the systematic analysis, simulation, and auditing of strategic and bias-driven behaviors among AI agents—particularly LLMs—in multi-agent settings. Rooted in classical and modern game theory, FAIRGAME formalizes the strategic interactions of LLMs as players in normal-form and repeated games, supports controlled experimentation across prompt languages, agent personalities, and payoff structures, and enables in-depth quantitative auditing of emergent cooperative, competitive, and fairness-related patterns. Extensively extended to multi-agent and multi-lingual domains, FAIRGAME connects model outputs to latent intentions, supporting AI governance and collective decision-making research (Huynh et al., 8 Dec 2025, Buscemi et al., 19 Apr 2025, Buscemi et al., 30 Jul 2025).

1. Architectural and Methodological Foundations

FAIRGAME is structured as a modular simulation pipeline with the following main components: a declarative configuration file specifying the game (payoff matrices, number of rounds, agent identities, languages, personalities, LLM models), a set of natural-language prompt templates per language (automatically translated and validated), an Agent Manager for instantiation and personality injection, a Simulation Engine for agent–agent or multi-agent repeated play, a Data Logger for full trajectory storage, and a Results Analyzer for statistical analysis and bias quantification (Buscemi et al., 19 Apr 2025, Buscemi et al., 30 Jul 2025). The architecture supports arbitrary numbers of repetitions and permutations (languages × personalities × knowledge), separating scenario definition from runtime execution and analysis.

Games are encoded in normal-form as G=(N,{Si}iN,{ui}iN)G = (N, \{S_i\}_{i \in N}, \{u_i\}_{i \in N}), with LLMs mapped to distinct agents. In each round, prompt templates are auto-populated with current game state, round index, payoff parameters, and full history; the LLM output is discretized to an action, payoffs are computed, and the process is logged. Analysis modules compute empirical distributions, divergence from theoretical equilibria (e.g. Nash, mixed), and a suite of bias metrics (Kullback–Leibler divergence, Total Variation, Wasserstein-1). FAIRGAME supports both two-player and nn-player games, one-shot and repeated horizons, and the injection of cooperative, selfish, or neutral personalities (Huynh et al., 8 Dec 2025, Buscemi et al., 19 Apr 2025).

2. Game Types, Extensions, and Parameterization

FAIRGAME was initially designed for repeated two-player matrix games, most notably the Prisoner's Dilemma (PD), with configuration for conventional, "harsh", or "mild" payoff structures defined by controlled manipulation of the reward ranking T>R>P>ST > R > P > S. The primary innovation in (Huynh et al., 8 Dec 2025) is the extension to (1) payoff-scaled repeated PD, parameterized by a multiplicative scaling λ\lambda such that (T,R,P,S)=λ(T0,R0,P0,S0)(T, R, P, S) = \lambda \cdot (T_0, R_0, P_0, S_0), enabling isolation of incentive-magnitude sensitivity (i.e., differentiating between high- and low-stake behavior without altering underlying strategic structure); (2) a genuinely multi-agent Public Goods Game (PGG, n=3n = 3) with dynamic per-round payoffs, vector-valued action histories, and synergy factor rr controlling marginal gain from collective investment versus free-riding.

This multi-agent PGG is specified by per-agent actions si,t{0,1}s_{i,t} \in \{0,1\} (contribute or not), with per-round payoff for agent ii: πi,t=rnj=1n(sj,tc)si,tc\pi_{i,t} = \frac{r}{n} \sum_{j=1}^n (s_{j,t} c) - s_{i,t} c where cc is the fixed round cost. Agent prompts are dynamically populated with full multi-agent action and payoff history, facilitating exploration of complex group phenomena such as free-riding, coalition formation, and end-game defection (Huynh et al., 8 Dec 2025).

3. Language, Personality, and Communication Effects

Central to FAIRGAME is the parameterization and measurement of systemic biases induced by prompt language (e.g., English, Vietnamese, Arabic), personality priming (explicit system-level instructions for "cooperative", "selfish", or null behavior), and, optionally, inter-agent pre-move communication. Language choice LL acts as a discrete framing parameter, generating empirically measurable shifts in baseline cooperation rates, with such differences sometimes as substantial as those induced by switching LLM architectures (Buscemi et al., 30 Jul 2025, Huynh et al., 8 Dec 2025). The framework supports controlled studies on horizon knowledge (whether the round count TT is revealed to agents) and enables toggling of communication protocols (one-shot and repeated games with/without messaging), capturing both lexical and strategic adaptation (Buscemi et al., 30 Jul 2025).

In multi-agent and communication-enabled settings, analysis extends to cooperation gradients, coordination scores, message length/frequency statistics, and lexical adaptation, revealing nuanced sensitivity to language, role order, and agent pairing (Buscemi et al., 30 Jul 2025).

4. Behavioral Analysis and Strategy Recognition

To move beyond aggregate statistics, FAIRGAME implements downstream supervised classification models for latent strategy recognition. Trajectories from repeated games are encoded as time series of (outcome, action) pairs, optionally supplemented by cumulative payoff trajectories. Canonical repeated-game strategies—ALLC (always cooperate), ALLD (always defect), TFT (tit-for-tat), WSLS (win-stay–lose-shift)—are used to synthetically label large training sets including controlled “execution noise” (ϵ\epsilon) to approximate LLM stochasticity. Multiple classifier architectures are benchmarked: logistic regression, random forest, feedforward neural networks, and Long Short-Term Memory (LSTM) models, with LSTM proving most robust to temporal context and noise (94%\sim94\% accuracy at ϵ=0.05\epsilon = 0.05) (Huynh et al., 8 Dec 2025).

These classifiers are deployed in post hoc or real-time mode to assign high-confidence (p>0.9p > 0.9) latent-strategy labels to LLM agent trajectories, facilitating detection of systematic model- and language-specific behavioral signatures, end-game effects, and deviations from prompt-instructed policy.

5. Key Empirical Findings and Strategic Implications

Extensive experiments using FAIRGAME evidence several robust, model- and language-dependent phenomena (Huynh et al., 8 Dec 2025, Buscemi et al., 30 Jul 2025):

  • Incentive-sensitive cooperation: PD cooperation rates shift systematically with payoff scale (λ\lambda); lower stakes drive higher defection, with model-specific modulation (e.g., GPT-4o highly sensitive, Claude less so).
  • Cross-linguistic divergence: English consistently elicits greater and more stable cooperation; in multi-agent PGGs, English-language induced curves decay less steeply under identical experimental conditions.
  • End-game defection and convergence: Explicit horizon induction yields coherent, non-chaotic collapse to defection in later rounds of repeated games, with “selfish” persona accelerating convergence to uniform non-cooperation.
  • Model-specific strategic biases: Each LLM exhibits distinct, irreducible tendencies: Claude 3.5 Haiku displays substantial residual cooperation under selfish frames; GPT-4o is strictly instruction-following but highly language-sensitive; Mistral Large balances instruction adherence with minimal cross-linguistic variance.
  • Effects of prompt order, role assignment, and personality disclosure: These factors measurably modulate outcome distributions and cooperation emergently, beyond simple differences in LLM or language.

A summary comparison of strategic behaviors:

Model Cooperative Bias Language Sensitivity Adherence to Persona
Claude 3.5 High (prosocial) Moderate Persistent under selfish
GPT-4o Low (obedient) High Perfect (zero cooperation under selfish)
Mistral Large Medium Low Strong, lowest internal variance

6. Applications, Extensibility, and Limitations

FAIRGAME underpins methodological rigor for auditing LLM agency and bias in competitive, cooperative, and mixed-motive settings. The modular configuration and dynamic prompt population enable rapid adaptation for new games (e.g., Stag Hunt, Battle of the Sexes), new languages, new communication protocols, and new agent taxonomies (Buscemi et al., 19 Apr 2025, Buscemi et al., 30 Jul 2025). Practical applications include evaluating candidate LLMs for deployment in decentralized coordination, collective decision environments, or AI-driven governance architectures; real-time strategy recognition to flag emergent free-riding or exploitation; and bias analysis for cross-linguistic or multi-modal system rollouts.

Limitations include restriction (in core implementations) to normal-form, finitely repeated games with fixed personalities, static prompt templates, and incomplete support for coalition formation or adaptively evolving agent personas (Buscemi et al., 19 Apr 2025). Automated translation is subject to rare semantic or idiomatic errors, and current deployments focus on dyadic and small-group settings, though generalization to nn-player games is supported in principle via the configuration architecture (Huynh et al., 8 Dec 2025, Buscemi et al., 30 Jul 2025).

7. Implications for AI Governance and Future Directions

Findings from FAIRGAME highlight the criticality of benchmark-driven, game-theoretic auditing for multi-agent AI systems. Model choice is not neutral: selection of LLM affects baseline cooperation and competitive behaviors, sensitivity to prompt language, and intra-system variance. Prompt template framing, language, and agent role assignment interact with model biases to shape emergent global dynamics, with governance protocols needing to address these interactions to prevent unintended hierarchies or failures of coordination. Real-time strategy classification can enable corrective or preemptive interventions in deployed multi-agent systems. A plausible implication is that, as LLMs assume larger roles in distributed decision-making, continued expansion and integration of FAIRGAME-type pipelines will become foundational to robust, transparent AI governance (Huynh et al., 8 Dec 2025).

Ongoing research directions include adaptive prompts, real-time persona evolution, expanded multi-agent and coalition modeling, and deeper integration with legal and ethical regulatory frameworks. The convergence behaviors, stability properties, and manipulation-resistance of such interactive multi-agent infrastructures remain key theoretical open problems.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to FAIRGAME Framework.