Program Equilibrium in Game Theory

Updated 31 December 2025

Program equilibrium is a framework where agents submit self-referential programs to choose actions, resulting in equilibria distinct from classical Nash outcomes.
It employs techniques such as diagonalization, Löb’s theorem, and randomized grounding to enforce robust cooperation and strategic reciprocity.
The concept has practical applications in cooperative AI, multiagent systems, and cryptography by establishing conditions for mutual cooperation among transparent agents.

Program equilibrium is a game-theoretic meta-concept in which agents commit to strategies by submitting computer programs that interact with each other’s source code. This encoding gives rise to strategies and equilibria fundamentally distinct from classical play, enabling sophisticated forms of cooperation and punishment—often leveraging meta-logical constructs, simulation, and randomness. Canonical examples include the open-source Prisoner’s Dilemma and its generalizations to multiagent normal-form games. Results in this area characterize the existence, construction, and limits of robust cooperative equilibria under various models of program interaction and randomness sharing (Oesterheld, 2022, Cooper et al., 19 Dec 2024).

1. Formal Framework

In an $n$ -player normal-form game $G=(\mathcal{A}_i,u_i)_{i=1}^n$ , each player traditionally chooses an action $a_i\in \mathcal{A}_i$ to maximize utility $u_i:\mathcal{A}_1\times...\times\mathcal{A}_n\to \mathbb{R}$ . The program equilibrium paradigm generalizes this by having each player submit a program $P_i$ drawn from a set $\mathsf{PROG}$ rich enough for recursion and self-reference. Programs receive the source code of their opponents and access to random bits; each program $P_i$ then maps $(P_{-i},\text{rand}) \to a_i$ . The payoff to player $i$ is determined by the output profile $(a_1,...,a_n)$ according to $u_i$ .

In two-player scenarios, programs $P_1,P_2$ run simultaneously, each reading the opponent’s source (and possibly its own), and output an action in $\{C,D\}$ (cooperation or defection) as in the Prisoner’s Dilemma.
The meta-game’s Nash equilibrium notion is: $(P_1,...,P_n)$ is a program equilibrium if no player can switch unilaterally to another program and increase expected utility, given how opponent programs respond to changes.

2. Canonical Program Types and Robust Cooperative Design

Several archetypal constructions embody robust cooperation beyond simple "cooperate-with-identical" strategies:

$\epsilon$ -Grounded Fair Bot ( $\mathrm{GFB}_\varepsilon$ ): Cooperates with probability $\varepsilon$ , otherwise defers to its opponent’s action when fed its own code. For $\varepsilon \in (0,2/3]$ , two $\mathrm{GFB}_\varepsilon$ programs yield mutual cooperation Nash equilibrium.
Proof-Based Bots:
- DUPOC: Cooperates if it can prove (in Peano Arithmetic) that its opponent cooperates when facing DUPOC; otherwise defects.
- CIMCIC: Cooperates if it can derive that its own cooperation implies the opponent’s cooperation (in $PA$ ).
- Prudent Bot ( $\mathrm{PB}_\theta$ ): Cooperates if it can prove the opponent would cooperate with $\mathrm{PB}_\theta$ and defect sufficiently often against a DefectBot ( $\mathrm{DB}$ ).

Key technical tools include Gödelian fixed points and Löb’s theorem, enabling logical self-reference needed for robust proof-based coordination. These bots generalize the notion of mutual cooperation, avoiding brittle dependence on syntactic identity.

3. Existence and Compatibility Theorems

The principal equilibrium results are:

Self-Play Robustness: Each of $\mathrm{GFB}_\varepsilon$ , $\mathrm{DUPOC}$ , $\mathrm{CIMCIC}$ , and $\mathrm{PB}_\theta$ achieves mutual cooperation with itself, forming Nash equilibria where neither player benefits from unilateral defection.
Cross-Compatibility: For any pair of bots from the class $\mathcal{B} = \{ \mathrm{GFB}_\varepsilon, \mathrm{DUPOC}, \mathrm{CIMCIC}, \mathrm{PB}_\theta \}$ (with suitable parameter choices), $(B_1,B_2)$ is a cooperative Nash equilibrium yielding $(C,C)$ in the two-player program-game PD. Proofs typically invoke Löb’s theorem and diagonalization to establish meta-logical guarantees for cooperation even among logically and statistically divergent agents (Oesterheld, 2022).

4. Simulation-Based Program Equilibria: General Theory and Limits

Simulationist approaches restrict interaction to program execution rather than code inspection:

Simulation-Based Programs: Programs can only interact by simulating their opponents on various inputs (especially different random seeds), capturing a behaviorist paradigm.
$\epsilon$ -Grounded $\pi$ -Bot (Oesterheld 2019): Recursively simulates opponents using a geometric stopping rule—a discounted repeated-game perspective with memory-1 policies.
Generalized Models (Cooper, Oesterheld, Conitzer): In settings with shared randomness, simulation-based $\epsilon$ -Grounded $\pi$ -Bots can realize a full folk theorem—for any feasible, strictly individually rational payoff profile, there is a correlated program equilibrium. Without shared randomness, achievable equilibria satisfy more stringent incentive constraints; additively separable games admit full folk theorem, but others (e.g., multi-player pirates, full-information games) can be impossible to coordinate without correlation (Cooper et al., 19 Dec 2024).

Program Type	Foundation	Robustness Feature
$\mathrm{GFB}_\varepsilon$	Probabilistic	Random grounding
DUPOC/CIMCIC	Proof-based	Provability in PA
$\mathrm{PB}_\theta$	Proof+Stats	Defection test ratio
Sim-Based $\epsilon$ -Bots	Simulationist	Behavioral equivalence

5. Key Technical Mechanisms

Diagonalization & Fixed-Point Logic: Critical for expressing mutually referential conditions (e.g., “cooperate if I can prove my opponent cooperates with me”).
Löb’s Theorem: Ensures that certain meta-logical implications guarantee actual cooperation in equilibrium, fundamental to proof-based strategies.
Randomized Grounding: Introducing cooperation with small probability resolves issues of undecidability and brittleness.
Screening of Randomness: In simulation-based models, private vs. shared randomness must be carefully screened to prevent strategic leakage or exploitation.

6. Applications and Implications

Cooperative AI: Program equilibrium results inform the design of agents able to maintain robust cooperation in settings of complete computational transparency. Such agents require only formal or probabilistic certification of reciprocity and are not reliant on rigid code-matching.
Multiagent Generalizations: With shared randomness, simulation-based agents coordinate more flexibly in multi-player settings, supporting the full folk theorem. Without it, design restrictions arise, leading to a taxonomy of achievable program equilibria linked to the underlying utility structure.
Security and Cryptography: Screening and correlated halting techniques prevent information leakage or manipulation via random bit exploitation, relevant for cryptographic agent design.

7. Open Problems and Future Directions

Limits Without Shared Randomness: Fundamental impossibilities remain in enforcing all collaborative equilibria via simulation-based agents absent a shared source of randomness. Characterizing the exact frontier of feasible payoffs (“incentive constraint region”) is an active area of inquiry.
Hybrid Models: Partial code inspection, communication channels, or proof-based simulation may blend robustness features and expand the folk theorem’s scope.
Implementation in AI Systems: Embedding $\epsilon$ -Grounded Bots in distributed agent environments (e.g., blockchain-secured randomness) offers practical avenues for robust equilibrium construction.
Extensions to Stochastic and Dynamic Games: Generalizing diagonalization and simulation techniques to broader game models remains an open challenge.

Program equilibrium theory synthesizes logic, simulation, and randomized protocols to advance the foundational understanding of robust strategic coordination among transparent computational agents (Oesterheld, 2022, Cooper et al., 19 Dec 2024).