Zero-Sum Sequential Game
- Zero-sum sequential games are dynamic adversarial models where each player's gain exactly offsets the other's loss at every stage.
- They employ history-dependent, Markov, or stationary strategies, solved via dynamic programming and PDE methods such as Shapley recursions.
- These games underpin applications in economics, reinforcement learning, and control, offering insights into equilibrium computation and online learning algorithms.
A zero-sum sequential game is a multi-stage conflict between two or more agents in which the sum of their payoffs at each outcome is identically zero. In these games, each player’s gain corresponds exactly to the other player’s loss at every point in the play, and strategies must be chosen dynamically—at each stage, possibly in reaction to the evolving state, revealed information, and previous actions. Zero-sum sequential games provide the mathematical foundation for adversarial decision processes in dynamic environments, including stochastic control, economics, reinforcement learning, and mechanism design.
1. Formal Framework for Zero-Sum Sequential Games
In the canonical finite zero-sum stochastic game setting, the state space is finite, actions sets and are compact or finite, and the law of motion is Markovian:
- The state at time is .
- Player 1 and player 2 simultaneously choose actions .
- The one-stage payoff to player 1 is ; player 2 receives .
- The next state is drawn from . Strategies can be history-dependent, Markov (state-based), or stationary.
Two classical payoff evaluations are used:
- The -stage average payoff:
- The -discounted payoff:
The central object is the game value or defined via the min-max: Existence and uniqueness are guaranteed by finite-game theory; optimal strategies are Markov.
The Shapley (Bellman) recursion provides dynamic programming equations: where is the Shapley operator: These equations generalize MDPs to adversarial multi-agent settings (Renault, 2019).
2. Structure and Variants
Continuous-Time and Limit Models
Zero-sum sequential games extend naturally to continuous-time Markov processes:
- States evolve via a Markov jump process controlled by both players.
- At each discrete time determined by a partition of , players choose actions based on observed state .
- Running payoffs are discounted by a density .
As the partition mesh , the value function converges to a unique viscosity solution of a Hamilton–Jacobi–Isaacs PDE: This reduction to a deterministic differential game justifies the use of continuous-time methods when stage durations vanish (Sorin, 2016).
Asymmetric Information and Stopping Games
Other key variants include games with asymmetric information (Kartik et al., 2019), where strategies depend on private and common signals, and zero-sum stopping games, in which one agent chooses a control process and the other selects a stopping time (Hernandez-Hernandez et al., 2012). In the singular controller/discretionary stopper model, values are characterized by coupled variational inequalities, and strategies can involve reflection and impulse actions, subject to state regions where control or stopping is optimal.
3. Existence and Characterization of Values and Equilibria
For finite zero-sum sequential games, the Mertens-Neyman theorem ensures the existence of a uniform value in the limit or , with convergence of both and to the same (Renault, 2019). The equilibrium concept coincides with Nash equilibrium in simultaneous-move games but adapts to sequential structure via backward induction, policy iteration, and dynamic programming.
In adversarial (strictly competitive) sequential games, the folk theorem (Fishburn-Roberts, Adler-Daskalakis-Papadimitriou, Raimondo) establishes that any two-player adversarial game is strategically equivalent (via positive affine payoff transformation) to a zero-sum game, ensuring that all equilibrium and solution concepts carry over (Khan et al., 2024).
Computational and Learning Aspects
Efficient algorithms exist for policy and value iteration in Markov zero-sum games, where per-round complexity is polynomial in state-action sizes. For online matrix games with adversarially evolving payoffs, regret-minimizing algorithms such as OMG-RFTL guarantee that players’ sequential payoffs are close to the minimax-optimal benchmark in hindsight (Cardoso et al., 2019). In nonstationary settings, episodic learning via expert ensembles (OFULinMat) yields saddle-point regret bounds scaling as , leveraging side information over static best-response adversarial learning (Pan et al., 2021).
For multi-agent, networked interaction models (zero-sum NMGs), Markov Nash equilibria and coarse correlated equilibria collapse in structure; stationary NE computation is tractable for star interactions but PPAD-hard for triangle or path topologies (Park et al., 2023).
4. Advanced Models: Asymmetric Information and Continuous-Time Games
Sequential zero-sum games with incomplete or asymmetric information (e.g., only one player observes a Brownian motion) are characterized by value functions that solve Hamilton–Jacobi type PDEs on measure spaces. The value is the largest convex subsolution of the corresponding PDE: where is convex in the distribution of the (possibly hidden) state (Gensbittel et al., 2016).
In finite-horizon zero-sum stopping games on random permutations, universal laws () govern the optimal expected payoff scaling, established via combinatorial identities and convexity principles in one-player zero-sum games (Dumitrescu et al., 2024).
5. Strategic, Computational, and Learning Implications
- Strategy Synthesis: Markov (state-based or stationary) strategies are typically sufficient for optimality in finite and discounted zero-sum sequential games, but history-dependent strategies may be needed in degenerate cases (cf. the "Big Match").
- Equilibrium Computation: Backward induction, policy iteration, and value iteration efficiently compute equilibrium strategies in finite and discounted settings. Complexity is polynomial for two-player zero-sum, and collapses further in games with separable networked structure.
- Learning in Unknown Games: Online learning approaches (SP-RFTL, OFULinMat) realize minimax optimality even under adversarial or nonstationary payoffs, outperforming naïve no-regret bandit algorithms.
- Effect of Information Structures: In leader-follower (Stackelberg) variants with noisy observation channels, equilibrium payoffs interpolate between Nash and pure Stackelberg values, with existence always assured and tightness conditions dictated by channel informativeness (Sun et al., 2022). Asymmetric information models require measure-valued dynamic programming and convexity considerations in the space of probability distributions over states.
6. Theoretical and Practical Consequences
Zero-sum sequential games provide a foundational template for adversarial decision processes in dynamic and uncertain environments, with formal guarantees for value existence, explicit characterizations via dynamic programming and viscosity/PDE methods, and deep connections between information structure, learning, and algorithmic hardness. The reduction of adversarial games to zero-sum, the convergence properties under vanishing stage duration, and applicability to both discrete and continuous-time models equip researchers with a robust apparatus for designing and analyzing strategic agents in multi-stage environments (Sorin, 2016, Renault, 2019, Khan et al., 2024, Park et al., 2023).