- The paper presents a framework for analyzing fully nonlinear stochastic control problems using distribution-level dynamics.
- It reformulates the Fokker-Planck equation into a deterministic continuity equation that connects control processes with measure transport.
- The work derives Pontryagin-type conditions to address high-dimensional optimal stopping under terminal distribution constraints.
Introduction and Context
This work presents a comprehensive analytic framework for stochastic optimal control laws with fully nonlinear dynamics, subject to optimal stopping and terminal distribution constraints. The focus is on deterministic, distribution-level representations of the evolution of marginals for state-independent diffusions, bridging concepts from stochastic control theory, measure-theoretic optimal transport, and calculus of variations on probability spaces. The paper advances the theoretical basis for solving high-dimensional stochastic control and stopping problems with distributional objectives, and systematically develops a methodology based on transformations of the Fokker-Planck equation.
The context includes Schrödinger Bridge Problems (SBP), Optimal Transport (OT), and Skorokhod Embedding Problems (SEP). Previous literature provides solutions for fixed-horizon SBP (including numerous links with OT [book_ge3]), and classical and modern results on SEP (stochastic processes with imposed stopping distributions). However, the paper addresses the broader class of nonlinear, potentially high-dimensional stochastic optimal control and stopping problems where the control objective is at the level of the entire evolving state PDF (as in swarm robotics, aero-guidance, and finance).
The dynamical system under consideration is an Itô diffusion given by:
dxt​=f(t,xt​,ut​)dt+σt​dwt​
with state xt​, control ut​, and diffusion σt​ (state-independent, uniformly elliptic for the main developments).
The main control problem is to design the feedback law ut​ and the stopping rule (determined by a space-time boundary g or an initial-state dependent stopping time function τm​(x0​)), so as to minimize a cost functional of the form:
E[∫0τg​​L(t,xt​,ut​)dt+Φ(τg​,xτg​​)]
with potential terminal distribution (or other) constraints. The core innovation is to analyze and solve this problem not at the trajectory level, but at the level of the law of the process, i.e., in the space of (sub-)probability densities ('density steering').
Main Theoretical Contributions
Distributional Dynamic Programming and sHJB
The paper constructs a dynamic programming equation—specifically, a distributional stochastic Hamilton-Jacobi-Bellman (sHJB) PDE—on the space of probability measures, using variational derivatives (Lions derivatives) and weak/viscosity solutions [vis_comp]. It is shown that the value function at the distributional level, for a fixed stopping boundary, decomposes as the expectation over the corresponding trajectory-wise value functions:
Vˉg(t,μ)=Ex∼μ​[Vg(t,x)]
The sHJB equation for Vˉg is characterized by a second-order operator involving both the drift and the diffusive terms, and the associated cost; critically, this distributional form brings the stochastic dynamic programming principle into registration with measure-theoretic optimal transport.
A key technical transition is the reformulation of the (controlled) Fokker-Planck equation as a deterministic continuity equation driven by a "score-corrected" velocity field:
xt​0
This transformation (which codifies the role of the Stein score, as in [song2020score]) enables the stochastic evolution of PDFs under the SDE to be recapitulated via deterministic transport on the law space. Importantly, the characteristics of the continuity equation reproduce the marginal law of the Itô diffusion, but do not realize its sample paths, underlining the nonlocal, measure-dependent dynamics.
The paper further formalizes the 'reverse' transformation, showing that classic Itô calculus can be retrieved through this Fokker-Planck transformation.
Equivalence and Stein-type Identities
Via a systematic application of Stein-type integration by parts identities and variational calculus in Wasserstein space [wass_first_order, stein1], it is established that the deterministic, density-level optimal control problem, with distributional constraints and optimal stopping, inherits the same functional structure—particularly the sHJB equation—as the original, stochastic (Itô) system. Thus, the rigorous theoretical isomorphism between controlled stochastic processes and deterministic density evolution under measure-valued controls is proven.
Distributionally-Constrained Optimal Stopping—First-Order Conditions
Building on the deterministic reformulation, a measure-space optimal control problem with state-dependent stopping ('terminal-time assignment') and terminal distribution constraints is constructed. By developing the variational theory on probability path space, under suitable regularity and compatibility conditions, a system of first-order necessary (Pontryagin-type) conditions is derived. These conditions characterize:
- Stationarity of the Hamiltonian with respect to control.
- Adjoint (co-state) backward equations, including both classical and nonlocal, diffusive correction terms.
- Transversality on the (potentially state-dependent) terminal boundary.
- Satisfaction of terminal probability distribution constraints.
Notably, the framework cohesively supports both common and general (initial-condition dependent) stopping mechanisms, covering classical, state-dependent exit time problems, and free boundary formulations.
Numerical and Methodological Implications
While the paper remains focused on the theoretical architecture, the deterministic measure-based reformulation presents pathways toward computationally tractable numerical routines, particularly for high-dimensional settings. Scalarizing the original stochastic problem onto (possibly neural-parametrized) deterministic flows on probability density space opens up the possibility of applying modern semi-Lagrangian, score-matching, and physics-informed learning techniques for solving such optimal control/stopping problems [numeric2, song2020score, numeric3].
For instance, the separation between characteristic and distributional representations can be exploited in sampling-based algorithms, and the variational analysis provides explicit gradients amenable to optimization-based solver design.
Theoretical and Practical Implications
The results provide a rigorous analytic foundation for a class of control problems with explicit distributional objectives, which are increasingly prevalent in robotics, aerospace, and mathematical finance (e.g., terminal wealth shaping, distribution-constrained stopping). The Fokker-Planck transformation formalism also strengthens the theoretical link between stochastic control and optimal transport, clarifying how mechanisms like score-based modeling and gradient flows in Wasserstein space interact with classical diffusive control.
The derived Pontryagin-like conditions in measure space set a template for further study on duality, regularity, and well-posedness in high-dimensional stochastic control with optimal stopping, and may be leveraged for the analysis of neural PDE solvers, high-dimensional SBP, and reach-avoid or safe-exploration reinforcement learning.
Notable Claims
- The deterministic reformulation via the Fokker-Planck/score field yields value functions and dynamic programming equations that are provably equivalent (in the viscosity/distributional sense) with the original stochastic (Itô) optimal control/stopping problem.
- First-order stationarity systems are rigorously derived for the measure-valued optimization, capturing both classical and nonlocal diffusive terms, subsuming prior work based solely on trajectory-wise or expectation-based terminal constraints.
Future Directions
The analytic results motivate several directions:
- Development of scalable sampling-based and physics-informed solvers for high-dimensional distribution steering with free terminal time, building on neural score models, measure flows, and adjoint-based gradients.
- Extension to non-uniformly elliptic or degenerate diffusion cases, general terminal cost structures, and singular (e.g., mass-killing or measure-killing) dynamics.
- Applications to mean-field games, probabilistic motion planning, stochastic reach-avoid, and interactive multi-agent reinforcement learning with stopping rules.
Conclusion
This work delivers a rigorous theoretical framework for the deterministic, measure-theoretic analysis of nonlinear stochastic optimal control and stopping problems, unifying concepts from stochastic control, optimal transport, and measure-valued variational analysis. The systematic use of the Fokker-Planck transformation, distributional sHJB, and variational calculus establishes a new, analytically tractable route for both theoretical investigation and computational solution of distribution-constrained control problems with optimal stopping.
References
- "Nonlinear Stochastic Optimal Control and Optimal Stopping using the Fokker-Planck Transformation" (2604.12153).
- C. Villani, Optimal Transport: Old and New, [book_ge3].
- M. Talbi, N. Touzi, J. Zhang, "Viscosity Solutions for Obstacle Problems on Wasserstein Space", [vis_comp].
- Y. Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations", [song2020score].