Density-Driven Optimal Control (D2OC)
- Density-Driven Optimal Control is a framework that lifts nonlinear, stochastic, and high-dimensional system dynamics into the space of probability densities using operator theory.
- It reformulates optimal control as a convex density program with linear PDE constraints and quadratic-over-linear cost functions to ensure global optimality.
- The method integrates finite-dimensional approximations and data-driven operator learning, enabling efficient and safe control synthesis across diverse applications.
Density-Driven Optimal Control (D2OC) is a class of optimal control methodologies that approach optimal policy synthesis by lifting nonlinear, stochastic, or mean-field system dynamics into the space of probability densities, rather than optimizing directly over trajectories or controls. By leveraging linear operator theory—chiefly the Perron-Frobenius (P-F) and Koopman operators—D2OC provides a principled, convex and often data-driven framework for solving controlled density evolution, including both pathwise and stationary objectives, safety constraints, and high-dimensional control synthesis, with algorithmic applicability to deterministic, stochastic, and hybrid dynamical systems.
1. Mathematical Foundations: System and Density Evolution
Density-Driven Optimal Control is fundamentally grounded in the evolution of probability densities governed by controlled dynamical systems. For a controlled diffusion process
the state density evolves according to the Fokker-Planck (forward Kolmogorov) partial differential equation (PDE): where for a feedback law . The infinitesimal generator expresses this linear evolution on densities: while the dual Koopman operator acts on observables as: This operator-theoretic structure supports both the forward (density) and backward (value function) perspectives crucial to D2OC (Vaidya et al., 2022).
2. Convex Density Program: Infinite-Dimensional Formulation
D2OC reformulates the original stochastic optimal control problem (SOCP) as a convex optimization in the space of densities. For infinite-horizon running cost and initial density , the expected cost functional is: Defining the flux variable , the stationary P-F (or Liouville) equation is: The convex program becomes: This quadratic-over-linear control cost is jointly convex in for , and the PDE constraint is linear (Vaidya et al., 2022, Moyalan et al., 2022, Huang et al., 2020).
3. Data-Driven and Finite-Dimensional Approximation
Finite-dimensional approximation of the above infinite program is achieved by projecting densities and fluxes onto a dictionary of nonnegative basis functions (e.g. Gaussian radial basis or polynomials): The infinitesimal P-F and Koopman operators are identified from data by methods such as extended dynamic mode decomposition (EDMD) or naturally structured DMD (NSDMD). Operator learning exploits time-series data from uncontrolled and controlled system simulations: The discrete convex quadratic program then reads: with , . This is solvable by standard convex solvers (e.g. CVX), with the feedback law recovered as (Vaidya et al., 2022, Moyalan et al., 2022, Huang et al., 2020).
4. Duality: Koopman-HJB Formulation and Policy Iteration
The density-driven convex program is dual to a value-function approach posed in the space of observables. The corresponding Hamilton-Jacobi-Bellman (HJB) PDE, with generator , is: At the optimum , giving a closed-loop operator . Policy iteration proceeds by alternately solving:
- Policy evaluation: linear PDE for under fixed
- Policy improvement: update
Koopman and P-F operators are adjoint, and under technical conditions, the optimal flux and value function are related through Sen-Sen duality (Vaidya et al., 2022).
5. Extensions: Constraints, Safety, and Dual Density-Driven Structures
The density-driven approach natively accommodates state and input constraints via linear or convex restrictions in density space:
- Hard state constraints: for obstacle avoidance
- Traversability or safety budgets:
Maximum-entropy variants add differential entropy regularization, producing Gaussian control policies and connecting D2OC to Schrödinger Bridges—entropy-optimal interpolating processes between marginals (Ito et al., 2022). Extensions to hybrid jump-diffusions, mean-field limits, and PDE-constrained swarm control rely on generalized Chapman-Kolmogorov or Fokker-Planck type equations, with first-order optimality conditions derived via infinite-dimensional Pontryagin or minimum principle frameworks (Bakshi et al., 2020, Sinigaglia et al., 2021).
6. Convergence, Global Optimality, and Practical Algorithms
Convexity of the lifted density-control cost ensures global optimality of the computed control law within the chosen function space. As the number of basis functions increases and operator approximations improve with data, the solution converges to the infinite-dimensional optimum (Vaidya et al., 2022, Moyalan et al., 2022). Standard convex programming complexity applies, dominated by quadratic program sizing. For high-dimensional or nonlinear systems, neural-network parameterizations and automatic-differentiation enable particle-based saddle-point solvers that bypass state-space gridding (Ma et al., 2023).
7. Applications and Numerical Demonstrations
D2OC frameworks have been validated on a range of systems:
- Nonlinear polynomial systems matching analytic HJB feedback (Moyalan et al., 2022)
- Navigating Dubins car models on off-road terrains with traversability and obstacle constraints (Moyalan et al., 2022)
- Large-scale particle swarms with boundary actuators via PDE-constrained nonlinear optimal control (Sinigaglia et al., 2021)
- Stochastic jump-diffusions with ensemble control (Bakshi et al., 2020)
- Schrödinger bridge and MaxEnt steering in discrete-time linear systems (Ito et al., 2022)
- Safe controller synthesis with distributional constraints for adaptive cruise control (Chen et al., 2019)
Each application exploits the ability to encode distributional performance, uncertainty, hard constraints, and scalability via the convex density-driven lifting, yielding significant advantages over traditional trajectory-based or merely value-function-based approaches.
References
- "Data-Driven Stochastic Optimal Control using Linear Transfer Operators" (Vaidya et al., 2022)
- "Maximum entropy optimal density control of discrete-time linear systems and Schrödinger bridges" (Ito et al., 2022)
- "Density control of large-scale particles swarm through PDE-constrained optimization" (Sinigaglia et al., 2021)
- "Data-Driven Convex Approach to Off-road Navigation via Linear Transfer Operators" (Moyalan et al., 2022)
- "Data-Driven Optimal Control via Linear Transfer Operators: A Convex Approach" (Moyalan et al., 2022)
- "A Convex Approach to Data-driven Optimal Control via Perron-Frobenius and Koopman Operators" (Huang et al., 2020)
- "Open-loop Deterministic Density Control of Marked Jump Diffusions" (Bakshi et al., 2020)
- "Optimal Safe Controller Synthesis: A Density Function Approach" (Chen et al., 2019)
- "High-dimensional Optimal Density Control with Wasserstein Metric Matching" (Ma et al., 2023)