Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Zero-Sum LQ Stochastic Differential Games

Updated 10 November 2025
  • Zero-sum LQ stochastic differential games are a framework for modeling competitive dynamic systems with linear state dynamics, quadratic costs, and stochastic perturbations.
  • They employ algebraic Riccati equations and backward stochastic differential equations to derive stabilizing state-feedback controls and ensure mean-square stability.
  • The framework extends to encompass mean-field, regime-switching, and delay systems, impacting applications in finance, robotics, power systems, and network security.

Zero-sum linear-quadratic stochastic differential games (ZSLQ-SDGs) constitute a canonical framework for modeling, analyzing, and synthesizing optimal strategies in competitive dynamic systems with stochastic perturbations. In such games, two adversarial players interact through continuous-time Itô stochastic differential equations with linear state dynamics, quadratic cost (payoff) functionals, and a zero-sum structure—so that one player’s gain exactly equals the other’s loss. The infinite-horizon setting with constant coefficients is particularly central for understanding ergodic and stationary behavior, feedback synthesis, turnpike phenomena, and the foundation of more general (Markovian, mean-field, regime-switching, or controlled-diffusion) game frameworks.

1. Mathematical Formulation and Structure

Consider the prototypical two-person ZSLQ-SDG on a complete filtered probability space (Ω,F,{Ft}t0,P)(\Omega,\mathcal F, \{\mathcal F_t\}_{t\ge0},\mathbb P) with a standard one-dimensional Brownian motion W()W(\cdot). The controlled system is

dX(t)=[AX(t)+B1u1(t)+B2u2(t)]dt+[CX(t)+D1u1(t)+D2u2(t)]dW(t),X(0)=xRn,dX(t) = [A X(t) + B_1 u_1(t) + B_2 u_2(t)]\,dt + [C X(t) + D_1 u_1(t) + D_2 u_2(t)]\,dW(t), \quad X(0)=x \in \mathbb R^n,

where A,CRn×nA, C \in \mathbb R^{n\times n}, Bi,DiRn×miB_i, D_i \in \mathbb R^{n\times m_i} are constant matrices, and ui()LF2(0,;Rmi)u_i(\cdot) \in L^2_{\mathcal F}(0,\infty;\mathbb R^{m_i}) are the players’ admissible controls.

The zero-sum quadratic performance functional is

J(x;u1,u2)=E0{QX,X+2S1X,u1+2S2X,u2+R11u1,u1+2R12u1,u2+R22u2,u2}dt,J(x; u_1, u_2) = \mathbb E \int_0^\infty \Big\{ \langle QX, X\rangle + 2\langle S_1 X, u_1\rangle + 2\langle S_2 X, u_2\rangle + \langle R_{11}u_1,u_1\rangle + 2\langle R_{12}u_1,u_2\rangle + \langle R_{22}u_2,u_2\rangle \Big\} dt,

where QSnQ \in \mathbb S^n, S1Rn×m1S_1 \in \mathbb R^{n\times m_1}, S2Rn×m2S_2 \in \mathbb R^{n\times m_2}, R11Sm1R_{11} \in \mathbb S^{m_1}, R22Sm2R_{22} \in \mathbb S^{m_2}, R12Rm1×m2R_{12} \in \mathbb R^{m_1\times m_2}.

Player 1 seeks to minimize JJ, Player 2 to maximize it.

2. Saddle Points: Open-loop and Closed-loop Notions

Two central solution concepts are relevant:

  • Open-Loop Saddle Point: A control pair (u1,u2)(u_1^*, u_2^*) is an open-loop saddle at xx if, for all admissible (u1,u2)(u_1, u_2),

J(x;u1,u2)J(x;u1,u2)J(x;u1,u2).J(x; u_1^*, u_2) \le J(x; u_1^*, u_2^*) \le J(x; u_1, u_2^*).

Open-loop strategies are measurable trajectories, not adapted to the evolving state.

  • Closed-Loop Saddle Point: A quadruple (Θ1,v1;Θ2,v2)(\Theta_1, v_1; \Theta_2, v_2), with ΘiRmi×n\Theta_i \in \mathbb R^{m_i\times n} and vi()LF2(0,;Rmi)v_i(\cdot) \in L^2_\mathcal F(0,\infty; \mathbb R^{m_i}), is a closed-loop saddle if the feedback laws ui(t)=ΘiX(t)+vi(t)u_i(t) = \Theta_i X(t) + v_i(t) render the system L2L^2-stable and

J(x;Θ1X+v1,Θ2X+v2)J(x;Θ1X+v1,Θ2X+v2)J(x;Θ1X+v1,Θ2X+v2)J(x; \Theta_1 X + v_1^*, \Theta_2 X + v_2) \le J(x; \Theta_1 X + v_1^*, \Theta_2 X + v_2^*) \le J(x; \Theta_1 X + v_1, \Theta_2 X + v_2^*)

for all alternative choices of vi()v_i(\cdot).

Closed-loop saddle points correspond to state-feedback Nash equilibria, desirable both for robustness and algebraic tractability in the infinite horizon.

3. Riccati Equation and BSDE Characterization

The existence and explicit construction of closed-loop saddle points is characterized by solutions to an algebraic Riccati equation (ARE) together with a stabilizability condition. Introduce

B=(B1,B2),D=(D1,D2),S=(S1,S2),R=(R11R12 R12TR22).B = (B_1, B_2), \quad D = (D_1, D_2), \quad S = (S_1, -S_2), \quad R = \begin{pmatrix} R_{11} & R_{12}\ R_{12}^T & -R_{22}\end{pmatrix}.

Define

M(P):=PA+ATP+CTPC+Q, L(P):=PB+CTPD+S, N(P):=R+DTPD.\begin{aligned} M(P) &:= P A + A^T P + C^T P C + Q,\ L(P) &:= P B + C^T P D + S,\ N(P) &:= R + D^T P D. \end{aligned}

The ARE is

M(P)L(P)N(P)L(P)T=0,M(P) - L(P) N(P)^\dagger L(P)^T = 0,

subject to

N(P)0,  Θ=N(P)1L(P)Twith[A+BΘ,  C+DΘ]  is  L2-stable.N(P) \ge 0,\quad \exists\;\Theta = -N(P)^{-1} L(P)^T \quad\text{with}\quad [A + B \Theta,\; C + D \Theta] \;\text{is}\; L^2\text{-stable}.

A solution PP is called stabilizing if such Θ\Theta exists and gives mean-square stability for the closed-loop system.

Once a stabilizing PP is found, the feedback gain is

Θ=N(P)1L(P)T,\Theta^* = -N(P)^{-1} L(P)^T,

and the affine term is constructed by solving the infinite-horizon linear BSDE

dη(t)={[ATΘTBT]η+[CTΘTDT]ζ+Pb(t)+q(t)}dt+ζ(t)dW(t),d\eta(t) = -\big\{ [A^T - \Theta^{*T} B^T]\eta + [C^T - \Theta^{*T} D^T]\zeta + P b(t) + q(t) \big\}\,dt + \zeta(t)\,dW(t),

and setting

v(t)=N(P)1[BTη(t)+DTζ(t)+p(t)].v^*(t) = -N(P)^{-1}[B^T \eta(t) + D^T \zeta(t) + p(t)].

The value function has the representation

V(x)=xTPx+2E[η(0)Tx]+E0(R+DTPD)v(t),v(t)BTη+DTζ+p,N(P)1[BTη+DTζ+p]dt.V(x) = x^T P x + 2\,\mathbb E[\eta(0)^T x] + \mathbb E \int_0^\infty \Big\langle (R + D^T P D) v^*(t), v^*(t)\Big\rangle - \Big\langle B^T\eta + D^T\zeta + p, N(P)^{-1}[B^T\eta + D^T\zeta + p]\Big\rangle dt.

The unique solvability of the infinite-horizon BSDE is ensured under L2L^2-stability of [A,C][A, C], by classical energy estimates.

The central equivalence established is:

  • (i) Problem admits a closed-loop saddle if and only if the algebraic Riccati equation has a stabilizing solution.
  • (ii) The feedback law u(t)=ΘX(t)+v(t)u^*(t)=\Theta^* X^*(t) + v^*(t) is a saddle, where XX^* evolves under the closed-loop SDE.

This solution is unique under the L2L^2-stability and regularity hypotheses.

Comparative aspects:

  • For games with deterministic coefficients and finite horizon, the associated Riccati equation is time-varying and its solution yields finite-horizon saddle laws and value functions (Sun, 2020).
  • In the mean-field extension, zero-sum infinite-horizon games require the solvability of coupled generalized algebraic Riccati equations with a static stabilizing solution; the structure of the feedback law then adapts to the involvement of both individual and average states (Li et al., 2020).
  • In the Markovian regime-switching case, the Riccati system is replaced by coupled matrix equations indexed by the regime, but similar stabilizing and feedback synthesis principles apply (Li et al., 11 Sep 2025, Wu et al., 23 Aug 2024).
  • In finite-delay and Volterra integral equation extensions, explicit FBSVIE characterization and operator inequalities yield saddle solutions (Wang et al., 2010).

A saddle point may not exist if the Riccati equation lacks regular (sign-definite) solutions, if L2L^2-stabilizability fails, or if the cost is not coercive in the controls (Sun et al., 2014).

5. Algorithmic and Practical Aspects

Direct computation of the stabilizing PP is critical. In more general settings (multi-input/multi-noise), the ARE is matrix-valued and may be high-dimensional. Algorithmic solvers have been developed, e.g., dual-layer iterative defect-correction schemes, where each iteration solves a sequence of single-player AREs to approximate the saddle solution (Wang, 3 Nov 2025). These methods are numerically validated for moderate problem sizes, achieving rapid convergence even in multidimensional and multi-noise cases.

Feedback gains can be computed once PP is determined, leading to explicit controllers: ui(t)=ΘiX(t)+vi(t),i=1,2.u_i^*(t) = \Theta_i^* X^*(t) + v_i^*(t),\quad i=1,2. Resource requirements are dominated by matrix Riccati and BSDE solvers, scaling polynomially with state and control dimensions.

6. Extensions: Structure and Regime-Switching

The ZSLQ-SDG framework extends in several directions:

  • Mean-field games involve state/control averages in dynamics and costs, requiring coupled Riccati equations and mean-field FBSDE analysis (Li et al., 2020).
  • Regime-switching games introduce a finite-state Markov chain modulating the system coefficients, leading to systems of coupled Riccati equations and feedbacks indexed by the regime (Li et al., 11 Sep 2025, Wu et al., 23 Aug 2024, Wu et al., 3 Sep 2024).
  • Memory and delay systems are governed by Volterra (or delay) SDEs, for which the Riccati theory generalizes to operator- or integral-equation settings (Wang et al., 2010).
  • Turnpike properties: In long or infinite time horizons, equilibrium and value exhibit convergence to stationary distributions, with exponential rates under suitable uniform convexity/concavity and stability, enabling high-accuracy steady-state approximations for large TT (Sun et al., 4 Jun 2024, Li et al., 11 Sep 2025).

7. Applications and Significance

Zero-sum LQ stochastic differential games model competitive control in finance, power systems, distributed robotics, network security, and leader-follower (Stackelberg) interactions under uncertainty. Their tractability via Riccati equations and state-feedback synthesis underlies their widespread adoption as testbeds for more general game-theoretic and robust control developments. Recent results provide rigorous conditions for existence, uniqueness, and optimality of feedback Nash equilibria in broad generalizations, establish practical algorithms for high-dimensional cases, and clarify the link between infinite-horizon control and ergodicity/turnpike phenomena (Sun et al., 2014, Sun et al., 4 Jun 2024, Wang, 3 Nov 2025).

The structure and insights gained from ZSLQ-SDGs are foundational for inverse game design, controller synthesis under partial information or adversarial settings, and for investigating the impact of noise, non-coercivity, or memory on competitive dynamical systems.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Zero-Sum Linear-Quadratic Stochastic Differential Games.