Papers
Topics
Authors
Recent
Search
2000 character limit reached

Zero-Sum Sequential Game

Updated 30 January 2026
  • Zero-sum sequential games are dynamic adversarial models where each player's gain exactly offsets the other's loss at every stage.
  • They employ history-dependent, Markov, or stationary strategies, solved via dynamic programming and PDE methods such as Shapley recursions.
  • These games underpin applications in economics, reinforcement learning, and control, offering insights into equilibrium computation and online learning algorithms.

A zero-sum sequential game is a multi-stage conflict between two or more agents in which the sum of their payoffs at each outcome is identically zero. In these games, each player’s gain corresponds exactly to the other player’s loss at every point in the play, and strategies must be chosen dynamically—at each stage, possibly in reaction to the evolving state, revealed information, and previous actions. Zero-sum sequential games provide the mathematical foundation for adversarial decision processes in dynamic environments, including stochastic control, economics, reinforcement learning, and mechanism design.

1. Formal Framework for Zero-Sum Sequential Games

In the canonical finite zero-sum stochastic game setting, the state space KK is finite, actions sets AA and BB are compact or finite, and the law of motion is Markovian:

  • The state at time tt is ktKk_t\in K.
  • Player 1 and player 2 simultaneously choose actions (at,bt)(a_t, b_t).
  • The one-stage payoff to player 1 is r(kt,at,bt)r(k_t, a_t, b_t); player 2 receives r(kt,at,bt)-r(k_t, a_t, b_t).
  • The next state kt+1k_{t+1} is drawn from P(kt,at,bt)P(\cdot|k_t, a_t, b_t). Strategies can be history-dependent, Markov (state-based), or stationary.

Two classical payoff evaluations are used:

  • The nn-stage average payoff:

Γnk(σ,τ)=1nEkσ,τ[t=1nr(kt,at,bt)]\Gamma_n^k(\sigma, \tau) = \frac{1}{n} \mathbb{E}_k^{\sigma, \tau}\left[\sum_{t=1}^{n} r(k_t, a_t, b_t)\right]

  • The λ\lambda-discounted payoff:

Γλk(σ,τ)=Ekσ,τ[t=1λ(1λ)t1r(kt,at,bt)]\Gamma_\lambda^k(\sigma, \tau) = \mathbb{E}_k^{\sigma, \tau}\left[\sum_{t=1}^\infty \lambda (1-\lambda)^{t-1} r(k_t, a_t, b_t)\right]

The central object is the game value vn(k)v_n(k) or vλ(k)v_\lambda(k) defined via the min-max: vn(k)=supσinfτΓnk(σ,τ)=infτsupσΓnk(σ,τ)v_n(k) = \sup_\sigma \inf_\tau \Gamma_n^k(\sigma, \tau) = \inf_\tau \sup_\sigma \Gamma_n^k(\sigma, \tau) Existence and uniqueness are guaranteed by finite-game theory; optimal strategies are Markov.

The Shapley (Bellman) recursion provides dynamic programming equations: vn+1=Tvnv_{n+1} = T v_n where TT is the Shapley operator: (Tu)(k)=ValxΔ(A),yΔ(B)a,bx(a)y(b)[r(k,a,b)+kP(kk,a,b)u(k)](Tu)(k) = \operatorname{Val}_{x \in \Delta(A), y \in \Delta(B)} \sum_{a, b} x(a) y(b) \left[ r(k, a, b) + \sum_{k'} P(k'|k, a, b) u(k') \right] These equations generalize MDPs to adversarial multi-agent settings (Renault, 2019).

2. Structure and Variants

Continuous-Time and Limit Models

Zero-sum sequential games extend naturally to continuous-time Markov processes:

  • States evolve via a Markov jump process controlled by both players.
  • At each discrete time determined by a partition Π\Pi of R+\mathbb{R}^+, players choose actions based on observed state ZtnZ_{t_n}.
  • Running payoffs g(Zt,in,jn)g(Z_t, i_n, j_n) are discounted by a density k(t)k(t).

As the partition mesh Π0\|\Pi\| \to 0, the value function vΠv_\Pi converges to a unique viscosity solution of a Hamilton–Jacobi–Isaacs PDE: 0=tv(t,z)+supxΔ(I)infyΔ(J){k(t)g(z,x,y)+[q(x,y)[z,]v(t,)]}0 = \partial_t v(t, z) + \sup_{x \in \Delta(I)} \inf_{y \in \Delta(J)} \{ k(t) g(z, x, y) + [q(x, y)[z, \cdot] \cdot v(t, \cdot)] \} This reduction to a deterministic differential game justifies the use of continuous-time methods when stage durations vanish (Sorin, 2016).

Asymmetric Information and Stopping Games

Other key variants include games with asymmetric information (Kartik et al., 2019), where strategies depend on private and common signals, and zero-sum stopping games, in which one agent chooses a control process and the other selects a stopping time (Hernandez-Hernandez et al., 2012). In the singular controller/discretionary stopper model, values are characterized by coupled variational inequalities, and strategies can involve reflection and impulse actions, subject to state regions where control or stopping is optimal.

3. Existence and Characterization of Values and Equilibria

For finite zero-sum sequential games, the Mertens-Neyman theorem ensures the existence of a uniform value in the limit nn \to \infty or λ0\lambda \to 0, with convergence of both vnv_n and vλv_\lambda to the same vv^* (Renault, 2019). The equilibrium concept coincides with Nash equilibrium in simultaneous-move games but adapts to sequential structure via backward induction, policy iteration, and dynamic programming.

In adversarial (strictly competitive) sequential games, the folk theorem (Fishburn-Roberts, Adler-Daskalakis-Papadimitriou, Raimondo) establishes that any two-player adversarial game is strategically equivalent (via positive affine payoff transformation) to a zero-sum game, ensuring that all equilibrium and solution concepts carry over (Khan et al., 2024).

Computational and Learning Aspects

Efficient algorithms exist for policy and value iteration in Markov zero-sum games, where per-round complexity is polynomial in state-action sizes. For online matrix games with adversarially evolving payoffs, regret-minimizing algorithms such as OMG-RFTL guarantee that players’ sequential payoffs are close to the minimax-optimal benchmark in hindsight (Cardoso et al., 2019). In nonstationary settings, episodic learning via expert ensembles (OFULinMat) yields saddle-point regret bounds scaling as O~(SKT)\tilde O(S\sqrt{KT}), leveraging side information over static best-response adversarial learning (Pan et al., 2021).

For multi-agent, networked interaction models (zero-sum NMGs), Markov Nash equilibria and coarse correlated equilibria collapse in structure; stationary NE computation is tractable for star interactions but PPAD-hard for triangle or path topologies (Park et al., 2023).

4. Advanced Models: Asymmetric Information and Continuous-Time Games

Sequential zero-sum games with incomplete or asymmetric information (e.g., only one player observes a Brownian motion) are characterized by value functions that solve Hamilton–Jacobi type PDEs on measure spaces. The value is the largest convex subsolution of the corresponding PDE: tV(t,m)+12Rddivx[DmV(t,m,x)]m(dx)+H(t,m)=0\partial_t V(t, m) + \frac{1}{2} \int_{\mathbb{R}^d} \mathrm{div}_x[D_m V(t, m, x)] m(dx) + H(t, m) = 0 where V(t,m)V(t, m) is convex in the distribution mm of the (possibly hidden) state (Gensbittel et al., 2016).

In finite-horizon zero-sum stopping games on random permutations, universal laws (E[payoff]=Θ(n)\mathbb{E}[\text{payoff}] = \Theta(\sqrt n)) govern the optimal expected payoff scaling, established via combinatorial identities and convexity principles in one-player zero-sum games (Dumitrescu et al., 2024).

5. Strategic, Computational, and Learning Implications

  • Strategy Synthesis: Markov (state-based or stationary) strategies are typically sufficient for optimality in finite and discounted zero-sum sequential games, but history-dependent strategies may be needed in degenerate cases (cf. the "Big Match").
  • Equilibrium Computation: Backward induction, policy iteration, and value iteration efficiently compute equilibrium strategies in finite and discounted settings. Complexity is polynomial for two-player zero-sum, and collapses further in games with separable networked structure.
  • Learning in Unknown Games: Online learning approaches (SP-RFTL, OFULinMat) realize minimax optimality even under adversarial or nonstationary payoffs, outperforming naïve no-regret bandit algorithms.
  • Effect of Information Structures: In leader-follower (Stackelberg) variants with noisy observation channels, equilibrium payoffs interpolate between Nash and pure Stackelberg values, with existence always assured and tightness conditions dictated by channel informativeness (Sun et al., 2022). Asymmetric information models require measure-valued dynamic programming and convexity considerations in the space of probability distributions over states.

6. Theoretical and Practical Consequences

Zero-sum sequential games provide a foundational template for adversarial decision processes in dynamic and uncertain environments, with formal guarantees for value existence, explicit characterizations via dynamic programming and viscosity/PDE methods, and deep connections between information structure, learning, and algorithmic hardness. The reduction of adversarial games to zero-sum, the convergence properties under vanishing stage duration, and applicability to both discrete and continuous-time models equip researchers with a robust apparatus for designing and analyzing strategic agents in multi-stage environments (Sorin, 2016, Renault, 2019, Khan et al., 2024, Park et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Sum Sequential Game.