Optional Prisoner's Dilemma Game

Updated 4 July 2026

The Optional Prisoner’s Dilemma Game is a strategic model that extends the classic dilemma by introducing abstention, offering a fixed loner payoff.
It creates a cyclic dominance where defectors exploit cooperators, cooperators outperform abstainers, and abstainers avoid exploitation by defectors.
Variations in spatial structure, adaptive networks, and probabilistic abstention critically influence evolutionary outcomes and cooperative cluster stability.

The Optional Prisoner’s Dilemma Game (OPD), also called the voluntary prisoner’s dilemma, extends the standard Prisoner’s Dilemma by adding a third action—typically abstain, loner, or exit—alongside cooperation and defection. In the canonical optional formulation, if one or both players abstain, both receive a fixed loner payoff $L$ or $\sigma$ , so the game is no longer a $2\times 2$ dilemma but a three-strategy system. This modification changes the strategic geometry from a purely binary tension between $C$ and $D$ to a setting in which $D$ beats $C$ , $C$ beats $A$ , and $A$ beats $\sigma$ 0, thereby enabling cyclic dominance, coexistence, and structure-dependent support for cooperation (Cardinot et al., 2018, Cardinot et al., 2019).

1. Formal structure and representative payoff schemes

The standard Prisoner’s Dilemma is parameterized by the reward for mutual cooperation $\sigma$ 1, the punishment for mutual defection $\sigma$ 2, the sucker’s payoff $\sigma$ 3, and the temptation payoff $\sigma$ 4. In the classical form used in the probabilistic-abstention literature, the dilemma condition is

$\sigma$ 5

with the common normalization

$\sigma$ 6

The OPD adds a third option, abstention, and if either player abstains both receive the loner payoff $\sigma$ 7, so the ordering becomes

$\sigma$ 8

in the formulation that explicitly treats abstention as a third strategic alternative (Cardinot et al., 2018).

A frequently used weak-OPD parametrization in spatial models is

$\sigma$ 9

with abstention payoff $2\times 2$ 0 satisfying

$2\times 2$ 1

Under this specification, the payoff matrix is

$2\times 2$ 2

so abstention interrupts the exploitative $2\times 2$ 3- $2\times 2$ 4 interaction and replaces it by a fixed outside option (Cardinot et al., 2016).

Other papers use different but equivalent normalizations. One study adopts the Axelrod values

$2\times 2$ 5

and interprets any interaction involving $2\times 2$ 6 as giving both players payoff $2\times 2$ 7 (Cardinot et al., 2016). A one-shot anonymous OPD in a human–machine mixed population rescales payoffs by

$2\times 2$ 8

yielding

$2\times 2$ 9

with $C$ 0 written as the loner action and $C$ 1 as the corresponding payoff (Sharma et al., 2023).

Formulation	Strategy set	Characteristic conditions
Weak spatial OPD	$C$ 2	$C$ 3
Axelrod-style optional strategy	$C$ 4	$C$ 5
One-shot anonymous OPD	$C$ 6	payoff matrix rescaled by $C$ 7

These formulations differ in normalization and application, but they share the same defining feature: abstention is an explicit participation decision with its own payoff consequences.

2. Strategic logic of optional participation

The introduction of abstention changes the strategic relation among the available actions. In the standard account of OPD, the three strategies satisfy the cyclic relation

$C$ 8

This rock-paper-scissors-like structure is one of the central reasons optional participation can sustain cooperation even when direct $C$ 9- $D$ 0 competition would favor defection (Cardinot et al., 2018).

The logic is straightforward. Defectors exploit cooperators in direct interaction. Cooperators outperform abstainers because abstention yields only the loner payoff, whereas mutual cooperation can yield $D$ 1. Abstainers outperform defectors because opting out avoids the low-payoff environments created by defection. In evolutionary formulations, this means abstention can prevent unconditional takeover by defectors, but it does not eliminate strategic turnover; defection often persists because the game becomes cyclic rather than monotone (Cardinot et al., 2019).

A sharper statement appears in coevolutionary spatial OPD. When the population is reduced to only two strategies, the cycle collapses:

$D$ 2 dominates,
$D$ 3 dominates,
$D$ 4 dominates.

Thus the coexistence mechanism is genuinely three-strategy; removing any one component breaks the intransitive loop (Cardinot et al., 2017).

An early abstract on repeated finite Prisoner’s Dilemma also suggested a variant in which players can choose to opt out. That modification was said to enrich the game and to suggest dominance of cooperative strategies, while also linking bounded rationality, computational limits, and competitive analysis to the study of tractable but sub-optimal play [0701139]. This suggests that optionality entered the literature not only as a payoff perturbation but also as a way of altering the temporal and computational structure of the dilemma.

3. Evolutionary behavior in well-mixed and spatial populations

The OPD behaves differently in non-spatial and spatial environments. In a non-spatial evolutionary model with tournament selection, the threshold between defectors and abstainers is set by the comparison between the loner payoff and mutual defection payoff. For the pairwise defector–abstainer comparison, the paper derives

$D$ 5

Hence defectors and abstainers are tied at

$D$ 6

defectors dominate when $D$ 7, and abstainers dominate when $D$ 8. For the cooperator–abstainer comparison, it derives

$D$ 9

so cooperators always dominate abstainers because $D$ 0 in that model (Cardinot et al., 2016).

In the same well-mixed setting with all three strategies initially present at equal frequency, the reported outcomes are:

for $D$ 1, defectors dominate;
for $D$ 2, defectors still dominate on most runs, though abstainers occasionally prevail;
for $D$ 3, abstainers become increasingly dominant, and in some runs cooperators can outperform defectors.

The same paper reports that cooperation is fragile in the non-spatial model, surviving mainly when abstainers gain an advantage over defectors and thereby indirectly protect cooperators (Cardinot et al., 2016).

Spatial structure changes the dynamics because it allows clustering. On a $D$ 4 lattice with Moore neighborhoods, pairwise spatial comparisons reveal that adjacent cooperators can reinforce each other and spread against abstainers regardless of $D$ 5, while the $D$ 6– $D$ 7 relation retains the threshold logic tied to $D$ 8 versus $D$ 9. With equal random initial densities of $C$ 0, $C$ 1, and $C$ 2:

for $C$ 3, defectors quickly dominate, but cooperative clusters survive in about 65% of simulations thanks to abstainers;
for $C$ 4, abstainers often dominate, but cooperation may persist in stable clusters;
in about 51.5% of simulations for $C$ 5, a cooperative cluster of minimum size 9 forms early and persists.

The stable morphology described there is a “sandwich” configuration in which cooperators are surrounded by defectors and abstainers occupy the outer region. The same study reports “gliders” for $C$ 6, especially at $C$ 7 and $C$ 8, where defectors and abstainers switch cyclically near the boundary (Cardinot et al., 2016).

A different line of work replaces pure abstention by probabilistic abstention. In this hybrid model, each agent is described by

$C$ 9

where $C$ 0 denotes cooperation or defection and $C$ 1 is the probability of abstaining. The paper defines

$C$ 2

so $C$ 3 corresponds to a pure cooperator who always plays, $C$ 4 to a defector or full abstainer, and intermediate values to sporadic participation. The model reduces to the standard PD when $C$ 5 for all players, and to the OPD when $C$ 6. Across the tested parameter ranges, this hybrid sustains higher cooperation than both standard PD and standard OPD under synchronous and asynchronous updating, with intermediate abstention probabilities reported as the most favorable regime for cooperation (Cardinot et al., 2018).

4. Coevolution, mobility, and adaptive interaction structure

A major development in OPD research is the move from static lattices to coevolving or diluted interaction structures. In a weighted spatial OPD, agents occupy a $C$ 7 square lattice with Moore neighborhoods, each edge begins with weight

$C$ 8

and utilities are computed as

$C$ 9

Link weights are then adapted according to whether a local interaction utility is above or below the focal player’s average utility, with weights constrained by

$A$ 0

Strategy imitation occurs only if a random neighbor has higher utility, with probability

$A$ 1

Within this framework, abstainers are reported to protect cooperators against exploitation, especially when the link-weight amplitude is large. The paper identifies three qualitative regimes: abstainer-dominated freezing in the static or weakly adaptive case, cyclic dominance for intermediate coevolution strength, and cooperation dominance for strong coevolution combined with sufficiently favorable loner payoff (Cardinot et al., 2016).

A representative cyclic-dominance regime is reported at

$A$ 2

where abstainers invade defectors, defectors invade cooperators, and cooperators invade abstainers. A representative cooperation-dominant regime is

$A$ 3

where defectors are first suppressed by abstainers and small cooperative clusters later expand and invade abstainers (Cardinot et al., 2016).

A related coevolutionary model on a $A$ 4 lattice reports that when

$A$ 5

the three strategies stabilize around

$A$ 6

each. The same study emphasizes that cyclic dominance breaks down under two-strategy reductions and that recovery after severe mutation depends on the continued presence of the full spatial-support chain $A$ 7-near- $A$ 8, $A$ 9-near- $A$ 0, and $A$ 1-near- $A$ 2 (Cardinot et al., 2017).

Mobility on diluted lattices adds another mechanism. In the voluntary prisoner’s dilemma with density

$A$ 3

agents interact on a diluted square lattice with von Neumann neighborhoods and may move to neighboring empty sites according to a Fermi-like rule based on normalized utility. A key geometric threshold is the lattice percolation threshold

$A$ 4

On a fully occupied lattice ( $A$ 5), cyclic dominance survives under noisy imitation, but under fully rational imitation cooperators die out and the system freezes into a $A$ 6 state. With dilution and movement, the same cooperation-supporting mechanism reappears for most $A$ 7- $A$ 8 values when

$A$ 9

At very low density, the situation reverses: for $\sigma$ 00 cooperators die out and abstainers dominate, while around $\sigma$ 01 the system shows bistability, with runs ending in all- $\sigma$ 02 or all- $\sigma$ 03 (Cardinot et al., 2019).

These results collectively indicate that optionality does not operate independently of interaction structure. Its effect depends strongly on whether the environment is well mixed, spatial, weighted, diluted, or mobile.

5. Behavioral, environmental, and institutional extensions

The OPD has also been extended beyond standard evolutionary settings. In a one-shot anonymous human–machine mixed population, simple bots are assigned fixed strategies: always cooperate, always defect, never participate, or choose each action with probability $\sigma$ 04. In well-mixed populations, cooperative bots are reported to facilitate the emergence of cooperation under weak imitation, while loner bots have no meaningful effect. On regular lattices, loner bots become more consequential: under strong imitation they can facilitate the dominance of cooperation, but the effect is nonmonotonic. Around $\sigma$ 05, defectors are eliminated and cooperation can expand; around $\sigma$ 06, loner bots surround cooperative clusters and block further spread, reducing cooperation (Sharma et al., 2023).

A different institutional extension adds pre-game commitment. In that two-stage model, each player first chooses whether to accept commitment,

$\sigma$ 07

and then, in the game stage, chooses among

$\sigma$ 08

A full strategy has the form

$\sigma$ 09

where $\sigma$ 10, $\sigma$ 11 is the action if commitment is formed, and $\sigma$ 12 is the action otherwise, for a total of 18 strategies. The OPD payoff matrix is

$\sigma$ 13

with

$\sigma$ 14

The main result is that optional participation boosts commitment acceptance but fails to foster cooperation, leading instead to widespread exit behavior. Two institutional reward rules are then compared. Under STRICT-COM, only committed players who cooperate are rewarded; under FLEXIBLE-COM, any committed player who does not defect is rewarded. The strict rule is reported to promote cooperation more effectively, while the flexible rule creates an opportunistic exit loophole, though it can yield higher social welfare when $\sigma$ 15 is high and the reward budget $\sigma$ 16 is limited (Song et al., 8 Aug 2025).

Another extension couples OPD to a dynamic environment. There the payoff matrix depends on an environmental variable $\sigma$ 17,

$\sigma$ 18

and the environment evolves according to

$\sigma$ 19

where $\sigma$ 20 is the cooperator fraction. In the replicator version, the paper reports 11 fixed points and, in its illustrative example, eventual attraction to the all-abstain state $\sigma$ 21. In the pairwise-comparison version inspired by prospect theory, the paper reports 10 fixed points and convergence to an asymptotically stable interior equilibrium with

$\sigma$ 22

Here optionality is not merely a static outside option; it becomes part of a closed game–environment feedback system in which abstention, cooperation, and defection coevolve with environmental quality (Stella et al., 2021).

Not every three-action extension of the Prisoner’s Dilemma is an OPD in the strict sense. A conceptually related but distinct model is the generalized prisoner’s dilemma with strategy set

$\sigma$ 23

where $\sigma$ 24 denotes Silence. In that formulation, $\sigma$ 25 is described as neither cooperation nor defection, an ambiguous attitude, and a special state that may correspond to either $\sigma$ 26 or $\sigma$ 27 but is not distinguishable to the police. The classical Prisoner’s Dilemma is recovered when the third state is not taken into consideration. However, the model provides no explicit numerical payoff entries for the $\sigma$ 28-rows or $\sigma$ 29-columns and no equilibrium analysis for the generalized case. It is therefore conceptually related to optional participation but not equivalent to the standard abstain/exit interpretation of OPD (Deng et al., 2014).

A second conceptual distinction concerns what counts as “optionality.” In standard OPD, abstention is itself a strategic action. In the mobility-based voluntary prisoner’s dilemma, this point is made explicit: abstention is modeled as a true strategy $\sigma$ 30 with loner payoff $\sigma$ 31, not as physical movement away from an interaction. Mobility is a separate coevolutionary process that can restore cyclic dominance when strict rational updating on fully occupied graphs would otherwise create artificial frozen states (Cardinot et al., 2019).

A third distinction concerns whether abstention is a pure strategy or a participation propensity. In probabilistic abstention, abstention is an attribute $\sigma$ 32 attached to each agent rather than a separate pure strategy, and the OPD appears as the limiting case $\sigma$ 33. This suggests that “optional participation” spans a family of formal devices: pure loner strategies, probabilistic participation rates, pre-game exit contingencies, and environment-coupled outside options (Cardinot et al., 2018).

Taken together, the literature describes the Optional Prisoner’s Dilemma not as a single frozen model but as a research program organized around one structural innovation: agents may refuse direct participation in the dilemma. The consequences of that innovation are highly sensitive to payoff normalization, microscopic update rules, spatial organization, link adaptation, mobility, commitment institutions, and game–environment feedback. In some settings abstention mainly preserves biodiversity through cyclic dominance; in others it protects cooperative clusters, induces exit-dominated equilibria, or becomes the basis for stronger institutional design problems. This suggests that the OPD is best regarded as a family of participation-sensitive Prisoner’s Dilemma models rather than a single canonical game.