Joint Stochastic Optimal Control and Stopping

Updated 6 October 2025

JCtrlOS is a framework that jointly optimizes continuous control policies and optimal stopping times by integrating intervention strategies with exit decisions.
It employs local analysis using fundamental solution ratios to delineate regions of waiting versus action, guiding both singular control and free-boundary characterizations.
Applications in finance, engineering, and aquaculture illustrate how simulation, machine learning, and dynamic programming methods enhance the tractability of complex JCtrlOS problems.

Joint Stochastic Optimal Control and Stopping (JCtrlOS) encompasses a class of problems in which a decision-maker seeks to jointly optimize over both a continuous-time control policy and the timing of a stopping (or exit) action. This framework generalizes classical stochastic control and optimal stopping by integrating control synthesis and exit decisions, leading to applications across finance, engineering, and operational research where both intervention strategy and time of action are critical. The JCtrlOS paradigm is realized in the stochastic Hamilton–Jacobi–Bellman variational inequality (HJB-VI), dynamic programming with hybrid value functions, and connections between control, stopping, and game-theoretic equilibrium problems.

1. Characterization via Local Ratios and Boundary Analysis

The structural analysis of JCtrlOS, especially in one-dimensional diffusions, relies on the identification of critical thresholds (boundaries) that demarcate regions of inaction—where it is optimal to wait or not control—from regions where stopping or acting is optimal. The central finding in (Matomäki, 2013) is that the value and region structure for both optimal stopping and singular control are characterized, near boundaries, by two fundamental ratios, each involving the payoff function $g$ and the fundamental solutions $\psi$ (increasing) and $\phi$ (decreasing) to the ODE

$(\mathcal{A} - r)u(x) = 0,$

where $\mathcal{A}$ is the generator of the diffusion.

For optimal stopping near the lower boundary, the critical ratio is $g(x)/\psi(x)$ ; the maximizer $z^*$ identifies the boundary,

$V(x) = [g(z^*)/\psi(z^*)]\psi(x), \quad x \leq z^*.$

For singular control with (e.g.) downward reflection, the key ratio is $g(x)/\psi'(x)$ , leading to the local controlled value

$W_Z(x) = [g(z'^*)/\psi'(z'^*)]\psi(x), \quad x \leq z'^*.$

Analogous formulas hold near upper boundaries using $g(x)/\phi(x)$ or $-g(x)/\phi'(x)$ .

These ratios provide a local—but not necessarily global—characterization. Boundary regime changes are determined by maximizers of these ratios; multiple maximizers can imply “indifference intervals,” and unattainability issues may preclude some thresholds from being realized by the process.

2. Geometry of Inaction and Action Regions

The distinction between inaction (continuation) and action (stopping/control) regions is central in JCtrlOS. The strictly increasing nature of $g(x)/\psi(x)$ at $x$ implies that $x$ lies in the continuation region; that is, waiting is strictly preferable to acting. The sets where $g(x)/\psi(x)$ (or the analogous ratio for singular control) is maximized correspond to candidate thresholds for action. Key features include:

Multiple local maxima yield regions where the agent is indifferent between stopping at any of several points.
If $g/\psi$ equals a constant $K$ over an interval, the entire interval forms an indifference region.

These properties highlight the purely local nature of optimality: the global configuration of continuation and stopping regions is built from local ratio maximization.

3. Singular Control, Reflecting and Repelling Boundaries

JCtrlOS includes both convex (classical) and non-convex cases. In convex one-sided singular control, the standard equivalence with optimal stopping is realized via the differentiation of the control value with respect to state or resource, and the “free boundary” is reflecting—the control acts to maintain the process inside a safe region, often encoded as a Skorokhod reflection (Angelis et al., 2014). The value of the control problem satisfies $U_c(x,c) = v(x;c)$ (where $v$ is the stopping problem value at parameter $c$ ), and the optimal policy is to minimally reflect the process at the threshold.

Non-convexities give rise to repelling free boundaries: here, crossing the threshold leads to instantaneous, bang-bang controls (e.g., immediate exhaustion of fuel), and the optimal stopping rule becomes “stop when reaching $x > \gamma^*(c)$ ,” where $\gamma^*(c)$ denotes the repelling boundary. In such regimes, smooth fit typically fails, and the classical differential connection between SSC and stopping breaks down.

4. Classification, Simulation, and Learning Approaches

High-dimensional JCtrlOS problems motivate simulation-based, machine learning, and adaptive classification methodologies:

The problem can be viewed as learning a classifier for the zero-level set of a timing (advantage) function, $T(t,x)$ , which distinguishes the stopping region from the continuation region (Gramacy et al., 2013).
Adaptive simulation with dynamic trees, expected improvement (EI) sampling, and locally focused design allows substantial savings in simulation effort for complex, high-dimensional (e.g., multi-asset option pricing) JCtrlOS.
Reinforcement learning, randomized neural policies, and physics-informed neural networks (PINNs) have been effectively used for practical implementation. Algorithms such as the likelihood-ratio policy gradient for randomized stopping rules and PINN-based solvers for high-dimensional HJB-VIs (cf. American/swing option pricing) are applicable, making tractable previously intractable instances (Deschatre et al., 2020, Kamm, 3 Oct 2025).

5. Dynamic Programming, Duality, and Variational Inequality Structures

The JCtrlOS paradigm is grounded in the theory of variational inequalities and backward dynamic programming:

The value function $V(x)$ must solve a variational inequality that couples the HJB (control) and free-boundary (stopping) properties:

$\max\left\{ \mathcal{A}V(x) - rV(x),\ g(x) - V(x) \right\} = 0.$

In singular control, the obstacle problem features the derivative of the control value function equated to the associated stopping value.
Dual formulations and martingale transport connections characterize optimal policies as first hitting times of contact sets defined by value–cost equality, with the solution often giving the concave envelope of the cost function over measure space (Bayraktar et al., 2017, Ghoussoub et al., 2020).
Existence and regularity are secured via viscosity solutions to the HJB-variational inequality, inclusion of expectation or chance constraints, or by transforming to equivalent weak formulations where martingale-problem and measurable selection arguments enable dynamic programming with constraints (Bayraktar et al., 2023, Schmid et al., 2023).

6. Extensions: Games, Constraints, and Partial Information

JCtrlOS problems extend directly to games, expectation-constrained, and partial information scenarios:

In two-player nonzero-sum settings, Nash equilibria in optimal stopping (using hitting times at boundaries) lift to Nash equilibria in singular control games, with differential links between the value functions. The Skorokhod reflection provides the constructive method for designing equilibrium policies (Angelis et al., 2016).
When subject to expectation constraints, dynamic programming operates on extended state variables capturing accumulated costs so far, and the value function is constructed in an enlarged canonical space, leading to upper semi-analyticity and measurable selection in optimization (Bayraktar et al., 2023).
Under partial information, equivalences between optimal stopping, randomized stopping, and singular control persist, but the boundaries and policies must be adapted to the available filtration, impacting the timing and execution of actions (Agram et al., 2018).

7. Applications and Practical Implications

The analytic and computational outcomes for JCtrlOS have demonstrated advantages in practical domains:

In aquaculture operations, coupling optimal feeding rates (continuous control) with adaptive harvest timing (stopping) leads to substantially better economic performance than strategies optimizing either component in isolation. Both finite-difference HJB-VI solvers and PINN-based algorithms can obtain accurate optimal policies for systems with up to five state variables, and PINN methodologies show scalability promise for even higher dimensional, realistic models (Kamm, 3 Oct 2025).
In financial option pricing, stochastic portfolio optimization, and inventory/risk management, PINN-based and sequential design methods allow “zooming in” on the relevant decision boundaries and efficiently learning or computing optimal JCtrlOS policies, with orders-of-magnitude simulation reduction compared to conventional methods.

In summary, the joint stochastic optimal control and stopping framework is underpinned by a local boundary analysis via fundamental ratios, variational inequalities, and dynamic programming, with explicit, often low-dimensional characterizations near critical thresholds and robust extensions to more complex settings in high dimensions, nonconvexity, games, and constraints. The theory facilitates effective numerical, machine-learning, and simulation-based solutions, directly impacting real-world domains requiring synthesis of both real-time intervention and adaptive exit timing.