Nonlinear Stochastic Optimal Control and Optimal Stopping using the Fokker-Planck Transformation

Published 14 Apr 2026 in math.OC | (2604.12153v1)

Abstract: In this paper, we develop a theoretical framework for nonlinear stochastic optimal control problems with optimal stopping by establishing a density-based deterministic representation of the underlying diffusion. For state-independent diffusion, we rewrite the controlled Fokker-Planck equation as a continuity equation driven by a score-corrected velocity field, yielding a deterministic characteristic dynamics that reproduces the marginal law of the stochastic system. Leveraging Stein-type identities, we show that the associated distributional dynamic programming equation admits the same second-order differential operator as the distributional stochastic Hamilton-Jacobi-Bellman formulation. Building on this representation, we formulate an optimal control problem with state-dependent terminal-time assignment and terminal distributional constraints and derive the first-order necessary conditions using variational analysis. We present the conditions both for a common terminal time and for the general case of state-dependent stopping.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a framework for analyzing fully nonlinear stochastic control problems using distribution-level dynamics.
It reformulates the Fokker-Planck equation into a deterministic continuity equation that connects control processes with measure transport.
The work derives Pontryagin-type conditions to address high-dimensional optimal stopping under terminal distribution constraints.

Nonlinear Stochastic Optimal Control and Optimal Stopping via the Fokker-Planck Transformation

Introduction and Context

This work presents a comprehensive analytic framework for stochastic optimal control laws with fully nonlinear dynamics, subject to optimal stopping and terminal distribution constraints. The focus is on deterministic, distribution-level representations of the evolution of marginals for state-independent diffusions, bridging concepts from stochastic control theory, measure-theoretic optimal transport, and calculus of variations on probability spaces. The paper advances the theoretical basis for solving high-dimensional stochastic control and stopping problems with distributional objectives, and systematically develops a methodology based on transformations of the Fokker-Planck equation.

The context includes Schrödinger Bridge Problems (SBP), Optimal Transport (OT), and Skorokhod Embedding Problems (SEP). Previous literature provides solutions for fixed-horizon SBP (including numerous links with OT [book_ge3]), and classical and modern results on SEP (stochastic processes with imposed stopping distributions). However, the paper addresses the broader class of nonlinear, potentially high-dimensional stochastic optimal control and stopping problems where the control objective is at the level of the entire evolving state PDF (as in swarm robotics, aero-guidance, and finance).

Problem Formulation

The dynamical system under consideration is an Itô diffusion given by:

$dx_t = f(t, x_t, u_t)dt + \sigma_t d w_t$

with state $x_t$ , control $u_t$ , and diffusion $\sigma_t$ (state-independent, uniformly elliptic for the main developments).

The main control problem is to design the feedback law $u_t$ and the stopping rule (determined by a space-time boundary $g$ or an initial-state dependent stopping time function $\tau_m(x_0)$ ), so as to minimize a cost functional of the form:

$\mathbb{E} \left[ \int_0^{\tau_g} L(t, x_t, u_t) dt + \Phi(\tau_g, x_{\tau_g}) \right]$

with potential terminal distribution (or other) constraints. The core innovation is to analyze and solve this problem not at the trajectory level, but at the level of the law of the process, i.e., in the space of (sub-)probability densities ('density steering').

Main Theoretical Contributions

Distributional Dynamic Programming and sHJB

The paper constructs a dynamic programming equation—specifically, a distributional stochastic Hamilton-Jacobi-Bellman (sHJB) PDE—on the space of probability measures, using variational derivatives (Lions derivatives) and weak/viscosity solutions [vis_comp]. It is shown that the value function at the distributional level, for a fixed stopping boundary, decomposes as the expectation over the corresponding trajectory-wise value functions:

$\bar V^g(t, \mu) = \mathbb{E}_{x \sim \mu}[V^g(t, x)]$

The sHJB equation for $\bar V^g$ is characterized by a second-order operator involving both the drift and the diffusive terms, and the associated cost; critically, this distributional form brings the stochastic dynamic programming principle into registration with measure-theoretic optimal transport.

Fokker-Planck Transformation and Deterministic Reformulation

A key technical transition is the reformulation of the (controlled) Fokker-Planck equation as a deterministic continuity equation driven by a "score-corrected" velocity field:

$x_t$ 0

This transformation (which codifies the role of the Stein score, as in [song2020score]) enables the stochastic evolution of PDFs under the SDE to be recapitulated via deterministic transport on the law space. Importantly, the characteristics of the continuity equation reproduce the marginal law of the Itô diffusion, but do not realize its sample paths, underlining the nonlocal, measure-dependent dynamics.

The paper further formalizes the 'reverse' transformation, showing that classic Itô calculus can be retrieved through this Fokker-Planck transformation.

Equivalence and Stein-type Identities

Via a systematic application of Stein-type integration by parts identities and variational calculus in Wasserstein space [wass_first_order, stein1], it is established that the deterministic, density-level optimal control problem, with distributional constraints and optimal stopping, inherits the same functional structure—particularly the sHJB equation—as the original, stochastic (Itô) system. Thus, the rigorous theoretical isomorphism between controlled stochastic processes and deterministic density evolution under measure-valued controls is proven.

Distributionally-Constrained Optimal Stopping—First-Order Conditions

Building on the deterministic reformulation, a measure-space optimal control problem with state-dependent stopping ('terminal-time assignment') and terminal distribution constraints is constructed. By developing the variational theory on probability path space, under suitable regularity and compatibility conditions, a system of first-order necessary (Pontryagin-type) conditions is derived. These conditions characterize:

Stationarity of the Hamiltonian with respect to control.
Adjoint (co-state) backward equations, including both classical and nonlocal, diffusive correction terms.
Transversality on the (potentially state-dependent) terminal boundary.
Satisfaction of terminal probability distribution constraints.

Notably, the framework cohesively supports both common and general (initial-condition dependent) stopping mechanisms, covering classical, state-dependent exit time problems, and free boundary formulations.

Numerical and Methodological Implications

While the paper remains focused on the theoretical architecture, the deterministic measure-based reformulation presents pathways toward computationally tractable numerical routines, particularly for high-dimensional settings. Scalarizing the original stochastic problem onto (possibly neural-parametrized) deterministic flows on probability density space opens up the possibility of applying modern semi-Lagrangian, score-matching, and physics-informed learning techniques for solving such optimal control/stopping problems [numeric2, song2020score, numeric3].

For instance, the separation between characteristic and distributional representations can be exploited in sampling-based algorithms, and the variational analysis provides explicit gradients amenable to optimization-based solver design.

Theoretical and Practical Implications

The results provide a rigorous analytic foundation for a class of control problems with explicit distributional objectives, which are increasingly prevalent in robotics, aerospace, and mathematical finance (e.g., terminal wealth shaping, distribution-constrained stopping). The Fokker-Planck transformation formalism also strengthens the theoretical link between stochastic control and optimal transport, clarifying how mechanisms like score-based modeling and gradient flows in Wasserstein space interact with classical diffusive control.

The derived Pontryagin-like conditions in measure space set a template for further study on duality, regularity, and well-posedness in high-dimensional stochastic control with optimal stopping, and may be leveraged for the analysis of neural PDE solvers, high-dimensional SBP, and reach-avoid or safe-exploration reinforcement learning.

Notable Claims

The deterministic reformulation via the Fokker-Planck/score field yields value functions and dynamic programming equations that are provably equivalent (in the viscosity/distributional sense) with the original stochastic (Itô) optimal control/stopping problem.
First-order stationarity systems are rigorously derived for the measure-valued optimization, capturing both classical and nonlocal diffusive terms, subsuming prior work based solely on trajectory-wise or expectation-based terminal constraints.

Future Directions

The analytic results motivate several directions:

Development of scalable sampling-based and physics-informed solvers for high-dimensional distribution steering with free terminal time, building on neural score models, measure flows, and adjoint-based gradients.
Extension to non-uniformly elliptic or degenerate diffusion cases, general terminal cost structures, and singular (e.g., mass-killing or measure-killing) dynamics.
Applications to mean-field games, probabilistic motion planning, stochastic reach-avoid, and interactive multi-agent reinforcement learning with stopping rules.

Conclusion

This work delivers a rigorous theoretical framework for the deterministic, measure-theoretic analysis of nonlinear stochastic optimal control and stopping problems, unifying concepts from stochastic control, optimal transport, and measure-valued variational analysis. The systematic use of the Fokker-Planck transformation, distributional sHJB, and variational calculus establishes a new, analytically tractable route for both theoretical investigation and computational solution of distribution-constrained control problems with optimal stopping.

References

"Nonlinear Stochastic Optimal Control and Optimal Stopping using the Fokker-Planck Transformation" (2604.12153).
C. Villani, Optimal Transport: Old and New, [book_ge3].
M. Talbi, N. Touzi, J. Zhang, "Viscosity Solutions for Obstacle Problems on Wasserstein Space", [vis_comp].
Y. Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations", [song2020score].