Zero-Sum LQ Stochastic Differential Games

Updated 10 November 2025

Zero-sum LQ stochastic differential games are a framework for modeling competitive dynamic systems with linear state dynamics, quadratic costs, and stochastic perturbations.
They employ algebraic Riccati equations and backward stochastic differential equations to derive stabilizing state-feedback controls and ensure mean-square stability.
The framework extends to encompass mean-field, regime-switching, and delay systems, impacting applications in finance, robotics, power systems, and network security.

Zero-sum linear-quadratic stochastic differential games (ZSLQ-SDGs) constitute a canonical framework for modeling, analyzing, and synthesizing optimal strategies in competitive dynamic systems with stochastic perturbations. In such games, two adversarial players interact through continuous-time Itô stochastic differential equations with linear state dynamics, quadratic cost (payoff) functionals, and a zero-sum structure—so that one player’s gain exactly equals the other’s loss. The infinite-horizon setting with constant coefficients is particularly central for understanding ergodic and stationary behavior, feedback synthesis, turnpike phenomena, and the foundation of more general (Markovian, mean-field, regime-switching, or controlled-diffusion) game frameworks.

1. Mathematical Formulation and Structure

Consider the prototypical two-person ZSLQ-SDG on a complete filtered probability space $(\Omega,\mathcal F, \{\mathcal F_t\}_{t\ge0},\mathbb P)$ with a standard one-dimensional Brownian motion $W(\cdot)$ . The controlled system is

$dX(t) = [A X(t) + B_1 u_1(t) + B_2 u_2(t)]\,dt + [C X(t) + D_1 u_1(t) + D_2 u_2(t)]\,dW(t), \quad X(0)=x \in \mathbb R^n,$

where $A, C \in \mathbb R^{n\times n}$ , $B_i, D_i \in \mathbb R^{n\times m_i}$ are constant matrices, and $u_i(\cdot) \in L^2_{\mathcal F}(0,\infty;\mathbb R^{m_i})$ are the players’ admissible controls.

The zero-sum quadratic performance functional is

$J(x; u_1, u_2) = \mathbb E \int_0^\infty \Big\{ \langle QX, X\rangle + 2\langle S_1 X, u_1\rangle + 2\langle S_2 X, u_2\rangle + \langle R_{11}u_1,u_1\rangle + 2\langle R_{12}u_1,u_2\rangle + \langle R_{22}u_2,u_2\rangle \Big\} dt,$

where $Q \in \mathbb S^n$ , $S_1 \in \mathbb R^{n\times m_1}$ , $S_2 \in \mathbb R^{n\times m_2}$ , $R_{11} \in \mathbb S^{m_1}$ , $R_{22} \in \mathbb S^{m_2}$ , $R_{12} \in \mathbb R^{m_1\times m_2}$ .

Player 1 seeks to minimize $J$ , Player 2 to maximize it.

2. Saddle Points: Open-loop and Closed-loop Notions

Two central solution concepts are relevant:

Open-Loop Saddle Point: A control pair $(u_1^*, u_2^*)$ is an open-loop saddle at $x$ if, for all admissible $(u_1, u_2)$ ,

$J(x; u_1^*, u_2) \le J(x; u_1^*, u_2^*) \le J(x; u_1, u_2^*).$

Open-loop strategies are measurable trajectories, not adapted to the evolving state.

Closed-Loop Saddle Point: A quadruple $(\Theta_1, v_1; \Theta_2, v_2)$ , with $\Theta_i \in \mathbb R^{m_i\times n}$ and $v_i(\cdot) \in L^2_\mathcal F(0,\infty; \mathbb R^{m_i})$ , is a closed-loop saddle if the feedback laws $u_i(t) = \Theta_i X(t) + v_i(t)$ render the system $L^2$ -stable and

$J(x; \Theta_1 X + v_1^*, \Theta_2 X + v_2) \le J(x; \Theta_1 X + v_1^*, \Theta_2 X + v_2^*) \le J(x; \Theta_1 X + v_1, \Theta_2 X + v_2^*)$

for all alternative choices of $v_i(\cdot)$ .

Closed-loop saddle points correspond to state-feedback Nash equilibria, desirable both for robustness and algebraic tractability in the infinite horizon.

3. Riccati Equation and BSDE Characterization

The existence and explicit construction of closed-loop saddle points is characterized by solutions to an algebraic Riccati equation (ARE) together with a stabilizability condition. Introduce

$B = (B_1, B_2), \quad D = (D_1, D_2), \quad S = (S_1, -S_2), \quad R = \begin{pmatrix} R_{11} & R_{12}\ R_{12}^T & -R_{22}\end{pmatrix}.$

Define

$\begin{aligned} M(P) &:= P A + A^T P + C^T P C + Q,\ L(P) &:= P B + C^T P D + S,\ N(P) &:= R + D^T P D. \end{aligned}$

The ARE is

$M(P) - L(P) N(P)^\dagger L(P)^T = 0,$

subject to

$N(P) \ge 0,\quad \exists\;\Theta = -N(P)^{-1} L(P)^T \quad\text{with}\quad [A + B \Theta,\; C + D \Theta] \;\text{is}\; L^2\text{-stable}.$

A solution $P$ is called stabilizing if such $\Theta$ exists and gives mean-square stability for the closed-loop system.

Once a stabilizing $P$ is found, the feedback gain is

$\Theta^* = -N(P)^{-1} L(P)^T,$

and the affine term is constructed by solving the infinite-horizon linear BSDE

$d\eta(t) = -\big\{ [A^T - \Theta^{*T} B^T]\eta + [C^T - \Theta^{*T} D^T]\zeta + P b(t) + q(t) \big\}\,dt + \zeta(t)\,dW(t),$

and setting

$v^*(t) = -N(P)^{-1}[B^T \eta(t) + D^T \zeta(t) + p(t)].$

The value function has the representation

$V(x) = x^T P x + 2\,\mathbb E[\eta(0)^T x] + \mathbb E \int_0^\infty \Big\langle (R + D^T P D) v^*(t), v^*(t)\Big\rangle - \Big\langle B^T\eta + D^T\zeta + p, N(P)^{-1}[B^T\eta + D^T\zeta + p]\Big\rangle dt.$

The unique solvability of the infinite-horizon BSDE is ensured under $L^2$ -stability of $[A, C]$ , by classical energy estimates.

The central equivalence established is:

(i) Problem admits a closed-loop saddle if and only if the algebraic Riccati equation has a stabilizing solution.
(ii) The feedback law $u^*(t)=\Theta^* X^*(t) + v^*(t)$ is a saddle, where $X^*$ evolves under the closed-loop SDE.

This solution is unique under the $L^2$ -stability and regularity hypotheses.

Comparative aspects:

For games with deterministic coefficients and finite horizon, the associated Riccati equation is time-varying and its solution yields finite-horizon saddle laws and value functions (Sun, 2020).
In the mean-field extension, zero-sum infinite-horizon games require the solvability of coupled generalized algebraic Riccati equations with a static stabilizing solution; the structure of the feedback law then adapts to the involvement of both individual and average states (Li et al., 2020).
In the Markovian regime-switching case, the Riccati system is replaced by coupled matrix equations indexed by the regime, but similar stabilizing and feedback synthesis principles apply (Li et al., 11 Sep 2025, Wu et al., 23 Aug 2024).
In finite-delay and Volterra integral equation extensions, explicit FBSVIE characterization and operator inequalities yield saddle solutions (Wang et al., 2010).

A saddle point may not exist if the Riccati equation lacks regular (sign-definite) solutions, if $L^2$ -stabilizability fails, or if the cost is not coercive in the controls (Sun et al., 2014).

5. Algorithmic and Practical Aspects

Direct computation of the stabilizing $P$ is critical. In more general settings (multi-input/multi-noise), the ARE is matrix-valued and may be high-dimensional. Algorithmic solvers have been developed, e.g., dual-layer iterative defect-correction schemes, where each iteration solves a sequence of single-player AREs to approximate the saddle solution (Wang, 3 Nov 2025). These methods are numerically validated for moderate problem sizes, achieving rapid convergence even in multidimensional and multi-noise cases.

Feedback gains can be computed once $P$ is determined, leading to explicit controllers: $u_i^*(t) = \Theta_i^* X^*(t) + v_i^*(t),\quad i=1,2.$ Resource requirements are dominated by matrix Riccati and BSDE solvers, scaling polynomially with state and control dimensions.

6. Extensions: Structure and Regime-Switching

The ZSLQ-SDG framework extends in several directions:

Mean-field games involve state/control averages in dynamics and costs, requiring coupled Riccati equations and mean-field FBSDE analysis (Li et al., 2020).
Regime-switching games introduce a finite-state Markov chain modulating the system coefficients, leading to systems of coupled Riccati equations and feedbacks indexed by the regime (Li et al., 11 Sep 2025, Wu et al., 23 Aug 2024, Wu et al., 3 Sep 2024).
Memory and delay systems are governed by Volterra (or delay) SDEs, for which the Riccati theory generalizes to operator- or integral-equation settings (Wang et al., 2010).
Turnpike properties: In long or infinite time horizons, equilibrium and value exhibit convergence to stationary distributions, with exponential rates under suitable uniform convexity/concavity and stability, enabling high-accuracy steady-state approximations for large $T$ (Sun et al., 4 Jun 2024, Li et al., 11 Sep 2025).

7. Applications and Significance

Zero-sum LQ stochastic differential games model competitive control in finance, power systems, distributed robotics, network security, and leader-follower (Stackelberg) interactions under uncertainty. Their tractability via Riccati equations and state-feedback synthesis underlies their widespread adoption as testbeds for more general game-theoretic and robust control developments. Recent results provide rigorous conditions for existence, uniqueness, and optimality of feedback Nash equilibria in broad generalizations, establish practical algorithms for high-dimensional cases, and clarify the link between infinite-horizon control and ergodicity/turnpike phenomena (Sun et al., 2014, Sun et al., 4 Jun 2024, Wang, 3 Nov 2025).

The structure and insights gained from ZSLQ-SDGs are foundational for inverse game design, controller synthesis under partial information or adversarial settings, and for investigating the impact of noise, non-coercivity, or memory on competitive dynamical systems.