Program Equilibria: Code-Based Strategic Interaction

Updated 6 December 2025

Program equilibria are a generalization of Nash equilibrium where strategies are computer programs that inspect opponents’ source code to decide actions.
Key methods include proof-based and simulation-based bots, ensuring cooperation through recursive reasoning and probabilistic decision rules.
SDP-based techniques and MPEC frameworks provide tractable solutions for optimizing equilibria in both two-player dilemmas and multi-agent systems.

Program equilibria generalize Nash equilibrium to settings in which strategies are themselves computer programs capable of inspecting and simulating the opponents’ source code prior to selecting actions. This paradigm models interactions among mutually transparent agents, including AI systems and contractual institutions. Major frameworks include the paper of “program games” for classical dilemmas such as the Prisoner’s Dilemma, multi-agent generalizations, simulation-based approaches for robustness, and mathematical programs with equilibrium constraints (MPECs) for optimization problems. Conceptual advances focus on guaranteeing robust cooperative outcomes, characterizing equilibria under informational and computational constraints, and providing SDP-based solution methods for polynomial program games.

1. Formal Definitions and Program Game Structures

In a two-player program game for the Prisoner’s Dilemma, each player submits a program $p_i \in \mathit{PROG}_i$ , which, given read access to the opponent’s source code $p_{-i}$ , outputs either $C$ (cooperate) or $D$ (defect) (Oesterheld, 2022). These programs are assumed (almost surely) to halt; otherwise, non-termination is interpreted as $D$ . The payoff is given by $u_i(p_1, p_2) = u_i(a_1, a_2)$ , where $a_i = p_i(p_{-i})$ and $u_i$ is the standard payoff function with $T > R > P > S$ .

Generalizing to $n$ players, let $G = \bigl(N, \{\A_i\}_{i \in N}, \{u_i\}_{i \in N}\bigr)$ be an $n$ -player normal-form game with finite actions. A program $p_i$ for player $i$ is a function

$p_i \colon \bigl(\Pi_{-i} \times [r]^{\infty}\bigr) \to \A_i$

where $[r]^\infty$ denotes a random bitstream (sources of randomness may be independent or shared among players for coordination) (Cooper et al., 19 Dec 2024). Program equilibria are profiles $(p_1, \ldots, p_n)$ such that no player benefits from unilaterally switching to a different halting program.

2. Types of Program Equilibria: Fair Bots and Simulation-Based Strategies

The literature distinguishes proof-based bots (which inspect opponent code for provable cooperation), simulation-based bots (which actively run opponent programs on simulated inputs), and hybrids. Two principal constructions are:

$\epsilon$ -Grounded Fair Bot ( $\epsilon$ GFB): On input opponent $q$ , with probability $\epsilon$ return $C$ , otherwise invoke and copy $q(\epsilon$ GFB $)$ (Oesterheld, 2022). This introduces randomization and ensures almost sure halting.
Proof-Based Fair Bot (PFB): On input opponent $q$ , return $C$ if and only if $q$ (PFB) $=C$ is provable within Peano arithmetic ( $PA \vdash q(\text{PFB}) = C$ ), else return $D$ (Oesterheld, 2022).

Simulation-based strategies (e.g., $\epsilon$ -Grounded $\pi$ Bot) generalize $\epsilon$ GFB: players simulate opponents on truncated random streams, halting with probability $\epsilon$ , and select actions via policies $\pi_i$ based on accumulated simulated histories (Cooper et al., 19 Dec 2024). These are robust to code obfuscation and enable extension to the multi-player case.

3. Existence and Characterization of Cooperative Program Equilibria

For the two-player Prisoner’s Dilemma, several robust cooperative Nash equilibria have been formally established:

$(\epsilon\mathrm{GFB}, \epsilon\mathrm{GFB})$ is a Nash equilibrium yielding $(C, C)$ , as the recursion ensures mutual cooperation almost surely.
$(\mathrm{PFB}, \mathrm{PFB})$ constitutes an equilibrium yielding $(C, C)$ , established via Löb’s theorem: $PA \vdash \mathrm{PFB}(\mathrm{PFB}) = C$ implies $\mathrm{PFB}(\mathrm{PFB}) = C$ (Oesterheld, 2022).
$(\epsilon\mathrm{GFB}, \mathrm{PFB})$ also achieves cooperation, as $\mathrm{PFB}$ can provably infer the cooperative tendency embedded in $\epsilon\mathrm{GFB}$ .

These constructions are compatible, supporting families of robust cooperative equilibria that tolerate syntactic and algorithmic variation.

In $n$ -player settings, simulation-based program equilibria are characterized as follows:

With shared randomness (correlated program game), a folk theorem holds: any feasible and strictly individually rational payoff vector can be supported by a program equilibrium of correlated $\epsilon$ -Grounded $\pi$ Bots. Coordination on triggers and punishment is enabled by shared random cutoffs (Cooper et al., 19 Dec 2024).
With private randomness (uncorrelated program game), stricter constraints apply. The attainable payoffs are those strictly dominating the best mixtures under undetectable deviations, as codified by a penalty parameter $\lambda$ that controls the detection trade-off. If utilities are additively separable, the set of feasible equilibria is widened (Cooper et al., 19 Dec 2024).

4. Compatibility, Generalizations, and Limits

Syntactically distinct strategies (randomized grounding vs. proof-search) are shown to be compatible and jointly support cooperative outcomes. Proof-based bots can cooperate with any $\epsilon$ GFB program via Löb-style arguments, and multi-parameter mixtures of $\epsilon$ GFBs are stable. PrudentBot variants using extended consistency (e.g., $PA+1$ ), and probabilistic proof requirements, also integrate into this robust equilibrium family (Oesterheld, 2022).

Key limitations arise in multi-agent settings without shared randomness. The impossibility of full folk theorem enforcement: in the pirates’ dilemma (three players, strictly Pareto-optimal payoffs), cooperation cannot be sustained by simulationist programs because unobservable deviations elude collective punishment (Cooper et al., 19 Dec 2024). Coordination failure in private random times is fundamental.

5. Mathematical Programs with Equilibrium Constraints (MPECs)

A related but distinct instantiation involves optimization under equilibrium or complementarity constraints (MPECs), especially when data is polynomial. Formally, for $f,g_i,h_j \in \mathbb{R}[x, y]$ , one minimizes $f(x, y)$ subject to semialgebraic constraints and complementarity ( $g_\ell h_\ell = 0$ ). Equilibrium can be expressed via a lower-level value function $J(x, y)=\min\{h(x, y, v): v \in B(x)\} \ge 0$ (Jiao et al., 2019).

Global minimizers are found using moment–sum-of-squares (SOS) hierarchies, solved by semidefinite programming (SDP). Each relaxation $P_r$ corresponds to a truncated moment matrix and localizing matrices indexed by degree. The sequence of SDP solutions converges monotonically to the global minimum under Archimedean (compactness) conditions. Experiments demonstrate computational feasibility for small examples; matrix size is the main bottleneck (Jiao et al., 2019).

Program Type	Halting Guarantee	Syntactic Robustness	Multi-Player Generality
Proof-Based Fair Bot (PFB)	Yes	Low–Medium	2-player
$\epsilon$ -Grounded Fair Bot	Yes	High	2-player (original)
$\epsilon$ -Grounded $\pi$ Bot	Yes	High	$n$ -player (Cooper et al., 19 Dec 2024)

6. Illustrative Examples and Applications

In the canonical one-shot Prisoner’s Dilemma ( $u_i(C, C)=3,~u_i(D, C)=5,~u_i(D, D)=1,~u_i(C, D)=0$ ), correlated $\epsilon$ -Grounded $\pi$ Bots with grim-trigger policies support mutual cooperation for small $\epsilon$ ; a single deviation triggers punishment with probability $\epsilon$ , implying approximately optimal payoffs $(3,3)$ (Cooper et al., 19 Dec 2024).

The trust game demonstrates mixed-strategy equilibrium with uncorrelated $\epsilon$ -Grounded $\pi$ Bots: mixing greedy and charitable responses enforces a stable equilibrium indistinguishable from true mixed strategies, with detection and punishment dependent on random stopping (Cooper et al., 19 Dec 2024).

Polynomial MPECs are solved by SDP relaxations, as evidenced by small-scale benchmarks in (Jiao et al., 2019). Realistic use is currently limited to problems with $n+m \lesssim 5$ and relaxation order $r \lesssim 4$ .

7. Implications for AI, Multi-Agent Systems, and Economic Mechanisms

The program equilibrium paradigm expands the set of attainable cooperative solutions in strategic settings once agents have transparent access to each other’s source code. Randomization, proof-based reasoning, and simulation protocols are mutually compatible, suggesting robust frameworks for multi-agent trust and commitment without requiring identical implementations. The necessity of shared randomness for full coordination in $n$ -player games is fundamental. SDP-based techniques for polynomial games with equilibrium and complementarity constraints provide tractable global optimization methods for small to medium scale problems. These developments underpin advances in programmatic contract design, AI alignment, and mechanism design in transparent agent environments (Oesterheld, 2022, Cooper et al., 19 Dec 2024, Jiao et al., 2019).