Mean-Field Hamilton-Jacobi-Bellman Equation

Updated 27 November 2025

Mean-field HJB equations are partial differential equations characterizing the value function for stochastic control problems where dynamics depend on the state distribution.
They employ a dynamic programming principle on the Wasserstein space using Lions differentiability to tackle infinite-dimensional, non-local features.
Explicit linear-quadratic solutions demonstrate practical applications in finance and systemic risk, reducing the problem to coupled Riccati equations for optimal feedback controls.

A mean-field Hamilton-Jacobi-Bellman (HJB) equation is a partial differential equation characterizing the value function for stochastic optimal control problems in which the controlled system evolves according to McKean–Vlasov (mean-field) dynamics, i.e., where the drift, diffusion, or cost may depend on the distribution (law) of the state, and possibly also the control. This construct extends classical HJB theory to infinite-dimensional state spaces, typically the Wasserstein space of probability measures endowed with Lions differentiability. Mean-field HJB equations form the core of dynamic programming approaches for mean-field control and optimization, underpinning contemporary research in mean-field stochastic control, mean-field games, and large-population Markov decision processes.

1. Formulation of the Mean-Field Stochastic Control Problem

In the McKean–Vlasov framework, a controlled process $(X_t)_{t\in[0,T]}$ follows the stochastic differential equation

$dX_t = b(t, X_t, \mathbb{P}_{X_t}, \alpha_t)\,dt + \sigma(t, X_t, \mathbb{P}_{X_t}, \alpha_t)\,dW_t,$

where $b$ and $\sigma$ are Lipschitz functions in $(x, \mu, \alpha)$ , $\alpha$ is an admissible (progressively measurable, square-integrable) control with values in a compact set $\mathcal{A}\subset\mathbb{R}^m$ , and $\mathbb{P}_{X_t}$ is the law of $X_t$ . The cost functional is

$J(\alpha) = \mathbb{E}\bigg[ \int_0^T f(t, X_t, \alpha_t, \mathbb{P}_{(X_t, \alpha_t)})\,dt + g(X_T, \mathbb{P}_{X_T}) \bigg],$

with suitable growth and regularity constraints on $f$ and $g$ . The goal is to minimize $J(\alpha)$ over admissible controls $\alpha$ .

For feedback controls of the form $\alpha_t = \upsilon(t, X_t, \mathbb{P}_{X_t})$ with $\upsilon$ Lipschitz, the flow of marginals $\mu_t = \mathbb{P}_{X_t}$ evolves deterministically in $\mathcal{P}_2(\mathbb{R}^d)$ , the space of probability measures with finite second moment. The value function thus becomes $v(t,\mu)$ , the minimum cost starting from time $t$ and marginal law $\mu$ (Pham et al., 2015).

2. Dynamic Programming Principle and Bellman Equation on Wasserstein Space

Under this reformulation, the dynamic programming principle (DPP) holds in the space of probability measures: $v(t,\mu) = \inf_{\upsilon} \bigg\{ \int_t^{T} F(s, \mu_s, \upsilon(s, \cdot, \mu_s))\,ds + G(\mu_T) \bigg\},$ where $F$ and $G$ are the mean-field extensions of the running and terminal cost: $F(t, \mu, \upsilon) = \int_{\mathbb{R}^d} f(t, x, \upsilon(x), (\operatorname{Id}, \upsilon)_\#\mu)\,\mu(dx), \quad G(\mu) = \int_{\mathbb{R}^d} g(x, \mu)\,\mu(dx).$ The DPP takes a recursive form on $[0,T]\times \mathcal{P}_2(\mathbb{R}^d)$ .

Key to the analytic machinery is the notion of differentiability with respect to probability measures, as formalized by Lions. For a function $u: \mathcal{P}_2(\mathbb{R}^d)\to\mathbb{R}$ , the "lift" to $L^2(\Omega;\mathbb{R}^d)$ is defined by $U(X) = u(\mathbb{P}_X)$ . If $U$ is Fréchet-differentiable, the Lions derivative $\partial_\mu u(\mu)(x)$ exists and forms the infinitesimal generator for Itô's calculus on the Wasserstein space (Pham et al., 2015).

Applying the extended Itô formula yields: $d\,u(t, \mu_t) = \partial_t u(t,\mu_t)\,dt + \mathbb{E}\big[\partial_\mu u(t,\mu_t)(X_t)\cdot b_t \big]\,dt + \frac{1}{2} \mathbb{E}\big[ \operatorname{Tr}( \partial_x \partial_\mu u(t,\mu_t)(X_t)\, \sigma_t \sigma_t^\top )\big]\,dt.$ Plugging this chain rule, together with the DPP, leads to the mean-field HJB equation.

3. Mean-Field Hamilton-Jacobi-Bellman Equation: Structure and Interpretation

The resulting Bellman PDE on $[0,T)\times\mathcal{P}_2(\mathbb{R}^d)$ takes the form: $-\partial_t v(t,\mu) + \inf_{\upsilon} \bigg\{ F(t,\mu,\upsilon) + \int_{\mathbb{R}^d} \langle \partial_\mu v(t,\mu)(x), b(t,x,\mu,\upsilon(x))\rangle\,\mu(dx) + \frac{1}{2} \int_{\mathbb{R}^d} \operatorname{Tr}\big( \partial_x \partial_\mu v(t,\mu)(x)\, \sigma \sigma^\top(t,x,\mu,\upsilon(x)) \big)\,\mu(dx) \bigg\} = 0,$

$v(T,\mu) = G(\mu).$

The Hamiltonian is given explicitly in terms of drift, diffusion, and cost, with infimum over Markov controls $\upsilon$ . The appearance of Lions derivatives reflects the infinite-dimensional geometry of $\mathcal{P}_2$ , making the equation genuinely non-local and nonlinear in distributional argument (Pham et al., 2015).

4. Solution Concepts: Classical, Verification, and Viscosity

Existence and uniqueness of solutions depend on regularity:

Classical solution and Verification: If $w\in C^{1,2}$ solves the Bellman equation and the infimum is achieved by a Lipschitz feedback $\upsilon^*$ , then $w = v$ and the associated closed-loop control is optimal. The proof proceeds by applying Itô's formula to $w(s, \mu_s)$ and leveraging the PDE to dominate the cost functional, with equality achieved on the feedback minimizer (Pham et al., 2015).
Viscosity solutions: If smoothness fails, the equation is lifted to $L^2(\Omega;\mathbb{R}^d)$ , and a notion of viscosity solution is constructed via test functions that themselves are lifts from $\mathcal{P}_2$ . The value function $v$ is shown to satisfy the viscosity solution conditions, and a comparison principle holds for sub/supersolutions with suitable growth controls, implying uniqueness.

These results ensure the well-posedness of the mean-field Bellman equation in wide generality, even in the presence of measure dependence and degenerate diffusion.

5. Linear-Quadratic Explicit Solutions and Applications

In the linear-quadratic (LQ) case, $b$ and $\sigma$ are affine in $x$ and $\mu$ , and the costs are quadratic: $b(t,x,\mu,a) = b_0(t) + B(t)x + B_1(t)\bar{\mu} + C(t)a + C_1(t)\bar{\nu}, \quad \sigma(t,x,\mu,a) = \sigma_0(t) + D(t)x + D_1(t)\bar{\mu} + F(t)a + F_1(t)\bar{\nu},$ where $\bar{\mu} = \int x\,\mu(dx)$ , $\bar{\nu} = \int a\,\nu(da)$ . The value function admits a closed form as a quadratic function of $\mu$ : $v(t,\mu) = \operatorname{Var}(\mu)(A(t)) + \bar{\mu}^\top \Lambda(t)\,\bar{\mu} + y(t)^\top\bar{\mu} + x(t),$ with $A, \Lambda, y, x$ evolving according to coupled Riccati and linear ODEs.

These explicit solutions undergird applications including:

Mean-variance portfolio selection, recovering classical optimal investment formulas.
Systemic risk in inter-bank models, where the optimal borrowing-lending rates are obtained as affine functions of deviation from the population mean (Pham et al., 2015).

6. Open-Loop vs. Closed-Loop Controls and Equivalence

Under open-loop controls, where policies need not be feedback in $\mathbb{P}_{X_t}$ , the DPP remains valid on $\mathcal{P}_2$ with a modified Hamiltonian. However, under mild integrability conditions, one can show that the infima coincide for open- and closed-loop formulations, so that the HJB equation and optimal values coincide in both settings. This equivalence is particularly robust in the LQ case (Pham et al., 2015).

7. Relation to Mean-Field Game Theory and Extensions

The mean-field HJB equation forms the optimality condition in mean-field type control, mean-field games, and certain large-system Markov decision frameworks. It connects to the mean-field game master equation, which describes the limit of Nash equilibria for large populations and dualizes with Fokker–Planck equations for the state law, as in the foundational theory developed by Lions and subsequent works (Pham et al., 2015, Bensoussan et al., 2014, Gast et al., 2010).

Extensions include:

Master equations coupling the value function with the law and individual state (mean-field “social optimization” and necessary conditions for $\epsilon$ -person-by-person optimality) (Huang et al., 19 Aug 2025).
Infinite-dimensional PDEs arising in storage models or delayed systems, formulated in Hilbert or Banach spaces and linked analytically to the mean-field HJB structure (Bertucci et al., 2022, Fouque et al., 2018).
Weak/viscosity solutions in spaces of measures (e.g., via Fourier mode truncation) accommodating highly singular or non-convex data (Cecchin et al., 2022).

The rigorous paper of mean-field HJB equations continues to drive both mathematical theory and applications in stochastic control, finance, systemic risk, and large-scale engineered or physical systems.