2-Memory Stochastic-Update Strategy

Updated 2 November 2025

2-memory stochastic-update strategy is defined as a decision policy where the current action and memory state are determined probabilistically based on the present game state and the immediately preceding memory state.
It is applied in stochastic games, population protocols, and optimization algorithms, demonstrating its role in managing memory constraints while striving for efficiency and convergence.
Despite reducing computational load, 2-memory strategies face inherent limitations in adversarial environments like the Big Match, highlighting the need for scalable memory approaches.

A 2-memory stochastic-update strategy is a decision policy or algorithm whose update at each time step depends stochastically on both the most recent information and a selected portion of memory encompassing the immediately preceding step or value. This concept is prominent in the analytical and algorithmic paper of stochastic games, population protocols, learning dynamics, and optimization, where the interplay of memory length and stochasticity critically determines attainable efficiency, equilibrium, and convergence rates. The term "2-memory" typically refers to a state space or strategy profile involving two possible memory states, or equivalently, to the use of one bit of memory, but may also denote dependence on the last two time steps.

1. Historical Context and Key Theoretical Results

Early work, notably by Mertens & Neyman (1981), established that uniform $\varepsilon$ -optimal strategies in finite stochastic games require at most $O(n)$ memory states for the first $n$ stages. The refinement in (Hansen et al., 5 May 2025) reduces this bound to $O(\log n)$ memory states and demonstrates that stochastic memory updating—where transitions between memory states are probabilistic rather than deterministic—enables this improvement.

The critical impossibility result is that bounded public-memory strategies (including 2-memory and, by extension, any finite-memory) are strictly suboptimal in specific stochastic games. The Big Match is the canonical example: any $(m_t)$ -based public-memory strategy of Player 1, regardless of how it uses its two memory states, is unable to guarantee a payoff greater than an arbitrarily small $\delta$ in the long run, due to the ability of Player 2 to exploit the bounded memory.

2. Formal Definition in Stochastic Games

A 2-memory stochastic-update strategy for a player is one where the action and memory state at stage $t$ are determined by the current game state and the previous memory state, with transition probabilities that may depend on both. For public-memory, the memory states and transitions are fully observable by both players; for private memory, only the acting player accesses their memory state.

In formal terms, let $m_t \in \{0,1\}$ indicate the memory state (i.e., 2-memory or 1-bit), and $z_t$ the game state. The strategy specifies a distribution $\sigma(a_t, m_{t+1} | m_t, z_t)$ for each possible memory state and game state. Stochastic updating means that the transition function for $m_{t+1}$ is random, potentially depending on $z_t$ , $m_t$ , and previous actions.

3. Applications and Significance Across Domains

a) Stochastic Games with Long-Run Average Payoff

For general finite stochastic games, public-memory strategies with $O(\log n)$ states suffice for uniform $\varepsilon$ -optimality ((Hansen et al., 5 May 2025), Theorem 1), achieved via stochastic updating that incrementally increases the memory state with low probability—an approximate counting mechanism. This enables adaptation without linear memory growth.
Crucially, for the Big Match and related absorbing games, no 2-memory public-memory strategy is able to secure a positive average payoff in the long run. The negative result holds for all finite-memory strategies: for any $\delta > 0$ , there exists a counterstrategy by the opponent driving the limsup average payoff below $\delta$ .

Memory Type	General Stochastic Game	Big Match
Public, $O(n)$	$\varepsilon$ -optimal	$\varepsilon$ -optimal
Public, $O(\log n)$	$\varepsilon$ -optimal	Impossible
Finite public	Not known	Impossible
Finite private	Open (general), possible in specific cases	Possible

b) Population Update in Parallel Minority Games

In agents restricted to two options, a strategy using memory of the last visit to the alternative yields the lowest observed population variance (Vemula et al., 2 Sep 2025). The update depends on historical, delayed information (i.e., 2-memory). This memory-delayed strategy increases efficiency but leads to slow convergence and glassy freezing, with agents trapped in outdated majorities.

c) Evolutionary Multi-Objective Optimization

Stochastic population update (SPU) strategies, where both the main population and a separate archive of elite solutions act as dual-memory elements, provoke exponential running time speedups in algorithms such as NSGA-II and SMS-EMOA (Ren et al., 28 Jan 2025). The archive preserves all elites while main population proceeds stochastically, a useful separation of exploration and exploitation functions that mimics a practical 2-memory scheme.

d) Learning Dynamics in State-Based Games

In the "two-memory better reply with inertia dynamics" learning algorithm, each agent conditions its update on the last two periods, balancing stability (inertia), local improvement (better reply), and ergodic exploration (Li et al., 2018). This structure ensures almost sure convergence to recurrent state equilibria under certain accessibility and Markovian self-loop conditions. Universality and time-efficient convergence are not guaranteed in all games.

4. Mathematical Characterization and Limitations

In stochastic games, the main mathematical statements are:

Upper bound: For any $\varepsilon$ , uniform $\varepsilon$ -optimal public-memory strategies exist requiring at most $K_\varepsilon \log n$ memory states (with high probability), where $K_\varepsilon = O(1/\varepsilon)$ (Hansen et al., 5 May 2025).

$P_{\sigma, \tau}\left( \max_{t \leq n} m_t \geq K_\varepsilon \log n \right) \leq n^{-2}$

Impossibility for 2-memory: For any fixed $M$ (including $M=2$ ), and any $(m_t)$ -based public-memory strategy, there exists a counterstrategy $\tau$ such that

$\limsup_{n \to \infty} \gamma_n(\sigma, \tau) \leq \delta,\quad\forall\,\delta>0$

in the Big Match, indicating the "worthlessness" of all finite-memory public strategies.

Stochastic memory updating: Achieves $O(\log n)$ memory by probabilistic increments, contrasting deterministic memory schemes which require $O(n)$ or worse.

5. Synthesis: Broader Impact, Controversies, and Future Directions

The introduced separation between stochastic memory updating and deterministic finite-memory strategies marks a critical boundary in what is achievable with bounded resources in stochastic environments. The impossibility results for 2-memory (and more broadly, bounded-memory) public strategies, especially in adversarial or highly coupled games (e.g., Big Match), demonstrate that genuine adaptation and robustness require memory architectures capable of scaling—at least logarithmically—with play length or game complexity. These results unify disparate phenomena:

In algorithmic game theory, strong impossibility theorems preclude low-memory optimality for broad classes of games.
In population dynamics and distributed protocols, delayed memory (2-memory) can produce glassy freezing and suboptimal collective states (cf. PMG).
In optimization and learning, two-step memory induces strong convergence (see MSTGD, (Aixiang et al., 2022)) or ensures ergodicity with inertia (Li et al., 2018), but once bounded, cannot be universal.

A plausible implication is that any strategy or protocol purporting universal near-optimality in stochastic multi-agent settings must incorporate scalable memory and stochastic updating. The theoretical boundary is sharp: bounded-memory strategies are categorically limited, stochastic update and scalable memory are indispensable.

6. Summary Table: Sufficiency of 2-memory Stochastic Strategies

Domain	2-memory Stochastic Sufficient?	Commentary
General stochastic games	No (Big Match: Impossible)	Requires $O(\log n)$ states; stochastic updating key
Minority games (PMG)	Yes for lowest variance; not globally optimal	Glassy freezing, not true optimum
Evolutionary optimization	Archive + SPU (dual memory) suffices	Elite retention via archive achieves exponential speedup
State-based learning	Yes for convergence to RSE under conditions	Strong limitations: universality not achievable

This suggests a fundamental dichotomy: strategies with strictly bounded memory (such as 2-memory) can attain desirable properties under certain conditions (collective variance reduction, equilibrium convergence with accessibility), but are generically insufficient for optimality in adversarial or expressive stochastic environments due to exploitation risks and incapacity for long-horizon adaptation.