Papers
Topics
Authors
Recent
2000 character limit reached

Mean-Field Stochastic LQR Controller

Updated 11 December 2025
  • Mean-Field Stochastic LQR Controller is a control strategy that extends classical LQR by incorporating both individual-state and population-averaged interactions under stochastic uncertainty.
  • It employs a Lagrangian dual formulation to decouple mean and fluctuation dynamics, solving coupled Riccati equations for robust, scalable state-feedback synthesis.
  • Applications in microgrid frequency control and large-scale power networks demonstrate its ability to reduce overshoots and manage risk through variance-based constraints.

A mean-field stochastic linear quadratic regulator (MF-SLQR) controller generalizes the classical LQR approach to systems with both individual-state and mean-field (population-averaged) interactions under stochastic uncertainty, and further constrains state fluctuations to address risk. Such controllers are crucial in high-dimensional multi-agent networks, power grids, and large-scale coupled stochastic systems, particularly when low-probability, high-impact events must be systematically attenuated rather than averaged away as in the risk-neutral setting.

1. Problem Formulation: Dynamics, Cost, and Risk Constraint

Consider nn exchangeable agents indexed by ii with individual state xtiRdxx_t^i \in \mathbb{R}^{d_x} and control utiRduu_t^i \in \mathbb{R}^{d_u}. Each agent's dynamics incorporate both local and mean-field coupling: xt+1i=Axti+Buti+Aˉxˉt+Bˉuˉt+wti,x_{t+1}^i = A\,x_t^i + B\,u_t^i + \bar{A}\,\bar{x}_t + \bar{B}\,\bar{u}_t + w_t^i, where xˉt=1nj=1nxtj\bar{x}_t = \frac{1}{n}\sum_{j=1}^n x_t^j, uˉt=1nj=1nutj\bar{u}_t = \frac{1}{n}\sum_{j=1}^n u_t^j, and wtiw_t^i is an i.i.d. zero-mean noise sequence. In the nn\to\infty mean-field limit, agent states decompose into fluctuation and mean components: x~t+1i=Ax~ti+Bu~ti+w~ti, xˉt+1=(A+Aˉ)xˉt+(B+Bˉ)uˉt+wˉt,\begin{aligned} &\tilde{x}^i_{t+1} = A\,\tilde{x}_t^i + B\,\tilde{u}_t^i + \tilde{w}_t^i, \ &\bar{x}_{t+1} = (A+\bar{A})\,\bar{x}_t + (B+\bar{B})\,\bar{u}_t + \bar{w}_t, \end{aligned} where x~ti=xtixˉt\tilde{x}_t^i = x_t^i - \bar{x}_t, u~ti=utiuˉt\tilde{u}_t^i = u_t^i - \bar{u}_t.

The infinite-horizon average quadratic cost per agent (risk-neutral) is

J=lim supT1TE ⁣[t=0T1((xti)Qxti+(uti)Ruti+xˉtQˉxˉt+uˉtRˉuˉt)].J = \limsup_{T\to\infty} \frac{1}{T} \mathbb{E}\!\left[ \sum_{t=0}^{T-1} \left( (x_t^i)^\top Q x_t^i + (u_t^i)^\top R u_t^i + \bar{x}_t^\top \bar{Q} \bar{x}_t + \bar{u}_t^\top \bar{R} \bar{u}_t \right) \right].

To control rare but impactful fluctuations, a variance-type risk constraint is imposed. Define

dti=((xti)QxtiE[(xti)Qxtihti])2,d_t^i = \left( (x_t^i)^\top Q x_t^i - \mathbb{E}\left[(x_t^i)^\top Q x_t^i | h_t^i \right] \right)^2,

where htih_t^i denotes agent ii's history. The per-player time-averaged variance is

Jc=1ni=1nlim supT1TE[t=0T1dti].J_c = \frac{1}{n}\sum_{i=1}^n \limsup_{T\to\infty} \frac{1}{T} \mathbb{E}\left[ \sum_{t=0}^{T-1} d_t^i \right].

The MF-SLQR controller seeks to

min{uti}  J s.t.  (1),  JcΓ ,\begin{aligned} &\min_{\{u_t^i\}}~~ J \ &\text{s.t.}~~ (1),~~ J_c \leq \Gamma~, \end{aligned}

with risk budget Γ>0\Gamma > 0 (Roudneshin et al., 2023).

2. Lagrangian Dual Formulation and Decomposition

A Lagrange multiplier λ0\lambda \geq 0 combines the nominal and risk costs: L=J+λ(JcΓ).\mathcal{L} = J + \lambda (J_c - \Gamma). Owing to orthogonal mean–fluctuation decomposition, L\mathcal{L} decouples into two independent infinite-horizon LQR objectives: L=lim supT1TE[tx~tiQλx~ti+u~tiRλu~ti]fluctuation-LQR+lim supT1TE[txˉtQλˉxˉt+uˉtRλˉuˉt]mean-LQR,\mathcal{L} = \underbrace{\limsup_{T \to \infty} \frac{1}{T} \mathbb{E} \left[ \sum_t \tilde{x}_t^{i\top} Q_\lambda \tilde{x}_t^i + \tilde{u}_t^{i\top} R_\lambda \tilde{u}_t^i \right]}_{\text{fluctuation-LQR}} + \underbrace{\limsup_{T \to \infty} \frac{1}{T} \mathbb{E} \left[ \sum_t \bar{x}_t^\top \mathcal{Q}_{\bar\lambda} \bar{x}_t + \bar{u}_t^\top \mathcal{R}_{\bar\lambda} \bar{u}_t \right]}_{\text{mean-LQR}}, where

Qλ=Q+4λnQM2Q,Qλˉ=Q+Qˉ+4λQM2Q,Rλ=R, Rλˉ=R+RˉQ_\lambda = Q + \frac{4\lambda}{n} Q M_2 Q, \quad \mathcal{Q}_{\bar\lambda} = Q + \bar{Q} + 4\lambda Q M_2 Q, \quad R_\lambda = R,~\mathcal{R}_{\bar\lambda} = R + \bar{R}

and M2=lim supT1TEt=0T1(xtQxtE[xtQxt])2M_2 = \limsup_{T\to\infty} \frac{1}{T} \mathbb{E} \sum_{t=0}^{T-1}\left(x_t^\top Q x_t - \mathbb{E}[x_t^\top Q x_t]\right)^2. The dependence on λ\lambda drives risk sensitivity (Roudneshin et al., 2023).

3. Coupled Riccati Equations and Controller Synthesis

Both mean and fluctuation subsystems admit closed-form LQR solutions. The optimal value of λ=λ\lambda = \lambda^* is enforced via a primal–dual algorithm.

Riccati Equations

P=Qλ+APAAPB(R+BPB)1BPA, P=Qλˉ+(A+Aˉ)P(A+Aˉ) (A+Aˉ)P(B+Bˉ)[Rλˉ+(B+Bˉ)P(B+Bˉ)]1(B+Bˉ)P(A+Aˉ).\begin{aligned} P &= Q_\lambda + A^\top P A - A^\top P B (R + B^\top P B)^{-1} B^\top P A, \ \mathcal{P} &= \mathcal{Q}_{\bar\lambda} + (A+\bar{A})^\top \mathcal{P} (A+\bar{A}) \ & \qquad - (A+\bar{A})^\top \mathcal{P} (B+\bar{B}) [\mathcal{R}_{\bar\lambda} + (B+\bar{B})^\top \mathcal{P} (B+\bar{B})]^{-1} (B+\bar{B})^\top \mathcal{P} (A+\bar{A}). \end{aligned}

These equations admit a unique positive semidefinite stabilizing solution under standard MF-LQR stabilizability/detectability criteria (Roudneshin et al., 2023).

Controller Law

The optimal state-feedback is affine in the agent's own state and in the population mean: uti,=Kxti(KˉK)xˉt+k0,u_t^{i,*} = -K x_t^i - (\bar{K} - K)\bar{x}_t + k_0, with

K=(R+BPB)1BPA,Kˉ=[Rλˉ+(B+Bˉ)P(B+Bˉ)]1(B+Bˉ)P(A+Aˉ),K = (R + B^\top P B)^{-1} B^\top P A, \qquad \bar{K} = [\mathcal{R}_{\bar\lambda} + (B+\bar{B})^\top \mathcal{P} (B+\bar{B})]^{-1} (B+\bar{B})^\top \mathcal{P} (A+\bar{A}),

and k0k_0 arises if the noise is nonzero-mean in the dual formulation but is zero otherwise (Roudneshin et al., 2023). The feedback structure matches that of classical mean-field LQR, but the gain matrices internalize the variance penalty via the modified cost matrices.

4. Risk Constraint Enforcement and Computational Aspects

The optimal Lagrange multiplier λ\lambda^* is computed via a primal–dual loop. At each iteration:

  1. Update QλQ_\lambda, Qλˉ\mathcal{Q}_{\bar\lambda};
  2. Solve the two Riccati equations for PP, P\mathcal{P};
  3. Form KK, Kˉ\bar{K}, k0k_0;
  4. Simulate closed-loop dynamics and evaluate Jc(λ)J_c(\lambda);
  5. Update λ\lambda by a projected subgradient step: λk+1=[λk+η(Jc(λk)Γ)]+.\lambda_{k+1} = [\lambda_k + \eta (J_c(\lambda_k) - \Gamma)]_+. Each iteration requires O(dx3)O(d_x^3) operations (Riccati/Lyapunov equations) and is independent of the agent population nn. The resulting gains do not depend on nn, which ensures scalability (Roudneshin et al., 2023).

5. Structural Properties and Solution Comparison

  • Independence from Number of Players: All Riccati equations and controller gains depend only on local and mean-field coupling (A,B,Aˉ,BˉA, B, \bar{A}, \bar{B}) and cost weights, not on nn. This makes the approach viable for large-scale networks (Roudneshin et al., 2023).
  • Existence and Uniqueness: Unique stabilizing Riccati solutions and dual multipliers λ0\lambda^*\geq 0 are guaranteed by standard LQR system-theoretic conditions and strong duality (Roudneshin et al., 2023).
  • Risk Parameter Influence: As Γ\Gamma is decreased (tighter variance constraint), λ\lambda^* increases, which modifies QλQ_\lambda, leading to more conservative K,KˉK, \bar{K} (higher-magnitude feedback gains). This reduces the amplitude of risky excursions (e.g., overshoots) at the cost of modestly increased average cost JJ (Roudneshin et al., 2023).
  • Comparison to Risk-Neutral MF-LQR: Setting λ=0\lambda=0 recovers the standard mean-field LQR controller, which ignores state variance (Roudneshin et al., 2023).

6. Practical Example and Performance Analysis

A microgrid frequency control scenario is used as a high-dimensional case paper. Each area (agent) maintains a $4$-dimensional state including local frequency, generation, tie-line flow, and area control error integral. System matrices A,Aˉ,BA, \bar{A}, B are drawn from standard load frequency control (LFC) parameters and quadratic costs Q,RQ, R are assigned.

Table: Impact of Variance Constraint on Controller Behavior | Γ\Gamma (risk tolerance) | λ\lambda^* (dual) | Gain Magnitude | Overshoot | Average Cost JJ | |--------------------------|--------------------|---------------|-----------|------------------| | High | 0\approx 0 | Baseline | Large | Baseline | | Moderate | Moderate | Increased | Reduced | Slightly higher | | Low | Large | Largest | Minimal | Highest |

Reducing Γ\Gamma leads to increased λ\lambda^* and higher feedback gain magnitude, yielding faster disturbance damping and less vulnerability to rare high-variance events (Roudneshin et al., 2023).

7. Generalizations and Methodological Significance

Risk-constrained MF-SLQR controllers extend classic LQR and mean-field LQR by systematically regulating not only expected cost but also rare-event risk, via variance-type constraints. The affine law and dual-based synthesis parallel but augment the classical mean–fluctuation separation and are compatible with standard Riccati-based implementation. Scalability and algorithmic simplicity are preserved, and the general methodology readily extends to other risk proxies and limit regimes (Roudneshin et al., 2023).

These contributions are directly relevant for modern applications requiring explicit robustness to stochastic volatility and population-level coupling, such as large-scale energy systems and networked autonomous agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Mean-Field Stochastic LQR (MF-SLQR) Controller.