Closed-Loop Stochastic Optimal Control
- Closed-Loop Stochastic Optimal Control (CLS-OCP) is a framework that designs adaptive feedback policies to handle uncertainties in stochastic dynamical systems.
- It integrates Riccati-based and data-driven methodologies to guarantee recursive feasibility, robust performance, and closed-loop stability.
- Applications span LQG systems, delay and infinite-dimensional systems, and stochastic differential games, highlighting its versatility in complex control scenarios.
Closed-Loop Stochastic Optimal Control (CLS-OCP) refers to the systematic synthesis and analysis of feedback policies for controlling stochastic dynamical systems, in which the control at each time step can be adapted based on the full information available, typically the observed state or output process. In the CLS-OCP framework, the controller dynamically responds to evolving uncertainties, yielding superior robustness, recursive feasibility, and performance guarantees compared to open-loop or non-adaptive schemes. The field interfaces modern stochastic control theory, computational methods, and data-driven approaches, and admits rigorous solutions in both model-based and data-driven, linear and non-linear, finite- and infinite-dimensional, and mean-field and non-mean-field settings.
1. Core Formulation and Distinction of CLS-OCP
The mathematical formulation of CLS-OCP typically considers controlled dynamical systems on a filtered probability space with state and control , subject to stochastic disturbances, e.g.
for diffusion, or with jumps, delays, or regime-switching. The class of admissible controls comprises all feedback policies measurable w.r.t. the filtration generated by the observed process, i.e. where denotes the available information at time .
A critical distinction is drawn between closed-loop (feedback) and open-loop (open-plan) optimal control:
- Closed-loop controls are adapted to the history of observations, enabling dynamic response to stochasticity.
- Open-loop controls are predetermined functions or sequences, fixed in advance.
For Markovian settings and regular value functions, value equivalence often holds, but path-dependent costs or degenerate diffusion can lead to strict separation between CLS-OCP and open-loop OCP value functions, as demonstrated by Tsirelson-type counterexamples (Yong et al., 2020).
The formal CLS-OCP value function is: where denotes the set of admissible feedback laws.
2. Feedback Synthesis: Riccati-Based and General Data-Driven Approaches
In finite-dimensional linear quadratic Gaussian (LQG) and associated classes, CLS-OCP admits explicit synthesis via dynamic programming, yielding feedback laws through the solution of Riccati equations. Consider the canonical linear SDE: with quadratic cost
The control law is then characterized by
where solves the stochastic or deterministic Riccati equation (depending on the coefficients' adaptivity) (Sun et al., 2016, Sun et al., 2015, Lü, 2018, Sun et al., 2018). In the presence of jumps (Li et al., 2022), delay (Meng et al., 3 Oct 2025), mean-field coupling (Li et al., 2016, Song et al., 2023), regime-switching (Wu et al., 1 Mar 2024), or infinite-dimensional/boundary-controlled systems (Lü, 2018, Prohl et al., 18 Nov 2024), the Riccati equation is extended accordingly (e.g., to backward stochastic, Riccati–Volterra, or coupled matrix forms).
Data-driven CLS-OCP (no explicit parametric model) leverages persistently exciting input-output disturbance data and the stochastic extension of Willems’ fundamental lemma, as combined with polynomial chaos expansions (PCE). The predictive controller is formulated as a data-driven stochastic OCP, enforcing system constraints and cost entirely in terms of Hankel matrices built from recorded process histories (Pan et al., 2022, Pan et al., 2022). Recursive feasibility and closed-loop stability are ensured by interpolating the initial PCE condition between new measurements and prior open-loop predictions.
3. Recursive Feasibility, Stability, and Performance Guarantees
CLS-OCP frameworks emphasize recursive feasibility and practical (mean-square) stability under explicit assumptions:
- If the data-driven or Riccati-based optimization is feasible at the initial time, the design ensures feasibility at all subsequent steps via shift-and-augment or backward induction arguments (Pan et al., 2022, Pan et al., 2022).
- Deterministic or stochastic terminal set assumptions, uniform positive-definiteness, and appropriate contraction rates yield closed-loop bounds on stage costs and asymptotic performance, e.g.
with specified by disturbance energy and the terminal weights (Pan et al., 2022, Pan et al., 2022).
For model predictive path integral (MPPI) control, under smoothness and noise-matching conditions, the estimator approximates the optimal closed-loop law with suboptimality of order in the control and in the value, where is the exploration scale (Homburger et al., 28 Feb 2025).
Convexity, or uniform convexity, of the cost function is both necessary and sufficient for closed-loop solvability. When Riccati solutions exist and weighting matrices admit (possibly indefinite) regularity, the feedback law is optimal and uniquely defines the closed-loop solution (Sun et al., 2018, Sun et al., 2015, Lü, 2018).
4. Data-Driven and Stochastic Predictive CLS-OCP Schemes
Data-driven CLS-OCP departs from traditional model-based synthesis:
- Hankel-based embeddings of recorded trajectories eliminate the need for identification of parametric models (Pan et al., 2022).
- The control law operates over PCE coefficient trajectories, with chance constraints enforced via quantiles of PCE-predicted means and variances.
- An interpolated initial condition (convex combination of measured and predicted) provides convexity in the decision variable and improved recursive feasibility.
- Comprehensive numerical studies (e.g., on aircraft pitch-dynamics) validate that, under uniformly exciting data and properly designed chance constraints, closed-loop output distributions concentrate around targets, and costs remain bounded by explicit theoretical limits (Pan et al., 2022).
Stochastic MPC with online-optimized disturbance-feedback policies expands the class of admissible receding-horizon controls. Online optimization over affine policy classes maintains SOCP convexity, reduces conservatism relative to pre-fixed feedback approaches, and guarantees closed-loop satisfaction of chance constraints and average cost bounds (Bartos et al., 10 Feb 2025).
5. Extensions: Mean-Field, Regime-Switching, Infinite-Dimensional, and Delay Systems
CLS-OCP admits robust extensions:
- Mean-field systems: Feedback synthesis involves coupled generalized Riccati equations reflecting both local and mean-field interactions, and adapted solutions to linear BSDE and ODE constraints (Li et al., 2016, Song et al., 2023).
- Regime-switching and jumps: The controller is parameterized by Markovian regime, and coupled algebraic Riccati equations govern stability and performance (Wu et al., 1 Mar 2024, Li et al., 2022).
- Infinite-dimensional systems: Closed-loop feedback for controlled SPDEs relies on operator-valued Riccati equations, discretized in space-time for computational implementation (Prohl et al., 18 Nov 2024).
- State and control delay: Recent advances enable finite-dimensional feedback synthesis for delayed systems via reduction to stochastic Volterra integral equations, permitting causal, finite-dimensional feedback representations without infinite-dimensional lifts (Meng et al., 3 Oct 2025).
Across these extensions, the existence and uniqueness of solutions to the (possibly stochastic, coupled, or infinite-dimensional) Riccati equations remain the linchpin for closed-loop solvability.
6. Game-Theoretic CLS-OCP and Learning-Driven Approaches
CLS-OCP concepts generalize to stochastic differential games:
- Stackelberg games: Closed-loop Stackelberg equilibria are characterized by Riccati equations and adapted BSDEs for both follower and leader dynamics (Li et al., 2021, Li et al., 2023).
- Mean-field games and potential games: For Markov potential games, necessary and sufficient gradient-alignment conditions reduce the computation of closed-loop Nash equilibria to a single global optimal control problem, efficiently solvable by policy gradient or deep RL techniques (Macua et al., 2018).
- Learning parametric closed-loop policies: Sophisticated parametric representations (e.g., deep neural networks) can approximate CLS-OCP policies in high-dimensional or nonconvex stochastic control scenarios.
CLS-OCP thus furnishes a unifying foundation for both classic model-based feedback synthesis and emerging data-driven, distributionally robust, and multi-agent optimal control in stochastic dynamical systems.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free