Exit Time Relaxed Control Problems
- Exit time relaxed control problems are defined by optimizing a cost functional up to a system exit, using measure-valued relaxed controls to guarantee solution existence under nonconvex and singular conditions.
- They leverage Hamilton–Jacobi–Bellman equations and Pontryagin-type maximum principles to derive robust feedback laws for both deterministic and stochastic dynamical systems.
- Key applications include stabilization, safety verification, and reach-avoid games, with numerical approaches like direct HJB solvers and symbolic abstractions mitigating the curse of dimensionality.
Exit time relaxed control problems concern the synthesis, analysis, and numerical approximation of feedback control strategies for controlled dynamical systems, where the objective is to optimize a cost functional up to the (system-dependent) time at which the state exits a prescribed set. The introduction of relaxed controls—controls valued in probability measures over the admissible control set—allows for the existence of optimal solutions and provides analytic and numerical advantages in the presence of nonconvexity, singularities, and regularity issues. Exit time problems arise naturally in stabilization, safety verification, safe exploration, minimum-time synthesis, reach-avoid games, and stochastic reachability.
1. Mathematical Formulation of Exit Time Relaxed Control Problems
Consider a controlled system
where is compact and is continuous, locally Lipschitz in , and convex in (Yegorov et al., 2019). The exit time from a closed set (or, more generally, a sublevel set of a candidate Lyapunov function) is
The value function for the exit-time optimal control problem is
where and are running and terminal costs.
In the stochastic setting, the system dynamics are given by
with exit time for bounded open domain , and the objective
where encodes optional discount factors (Reisinger et al., 2020).
Relaxed controls are measure-valued processes with values in the probability simplex over , replacing deterministic actions by randomized policies. The (averaged) controlled dynamics become
This extension ensures existence and, under convexity, recovers the classical (non-relaxed) formulation (Lou et al., 2013).
2. Analysis: Hamilton–Jacobi–Bellman Equations and Pontryagin-Type Conditions
The value function for exit-time problems is characterized as the unique (viscosity) solution to a Hamilton–Jacobi–Bellman (HJB) PDE with exit/time boundary conditions. In deterministic continuous-time settings, for , the HJB equation is
(Yegorov et al., 2019). For stochastic controlled diffusions, the HJB incorporates the second-order operator: with boundary condition on (Reisinger et al., 2020). Here, is the infinitesimal generator averaged over the relaxed control.
For non-smooth or singular systems, Pontryagin-type Maximum Principles can be established for relaxed controls. The existence of an optimal relaxed trajectory and adjoint is guaranteed, with
and the support of concentrated on maximizers of the Hamiltonian : In convex settings, optimal classical controls are recovered as extremal points of the relaxed (measure) control.
3. Feedback Synthesis and Control Lyapunov Function Construction
A central application of exit-time control is the synthesis of global control Lyapunov functions (CLFs) for feedback stabilization. A local CLF and associated feedback can be computed over a neighborhood. By formulating an exit-time optimal control problem with respect to the sublevel set of , the solution concatenates inside the local region and the exit-time value outside, yielding a global CLF satisfying requisite decrease conditions: over the domain of asymptotic null-controllability (Yegorov et al., 2019).
The feedback law is synthesized via
exhibiting Lyapunov (strict) decrease along optimal closed-loop trajectories. In degenerate or stochastic settings, feedback relaxed controls—measurable selections of the optimal measure-valued policy—admit regularity and stability properties (e.g., Hölder/Lipschitz continuity with respect to data or regularization parameters) (Reisinger et al., 2020).
If no explicit local CLF is available, the construction is adapted by considering a family of exit-time problems with small target balls, yielding practical CLFs and uniform convergence as the ball radius vanishes.
4. Existence, Regularity, and Stability Results
- Existence and Regularity: Under convexity and regularity hypotheses on dynamics and costs, exit-time optimal control problems admit (relaxed) optimal solutions and value functions that are continuous or possess higher regularity (e.g., ) (Reisinger et al., 2020, Yegorov et al., 2019). For nonconvex or singular systems, relaxed solutions exist and satisfy suitable maximum principles (Lou et al., 2013).
- Stability: Feedback and value functions for relaxed exit-time problems are Lipschitz stable with respect to parameter perturbations, with bounds on the deviation in value and synthesized feedback under model mismatch (Reisinger et al., 2020).
- Exploration–Exploitation Interpolation: Regularization via convex penalties (e.g., entropy or general mixing penalties) induces continuous feedback laws, interpolating between pure exploitation (as the penalty vanishes) and robust exploratory policies. Monotone convergence and recovery of pure strategies in the limit are rigorously established (Reisinger et al., 2020).
5. Numerical Approaches and Approximate Solution Schemes
The solution of exit-time relaxed control problems involves high-dimensional HJB PDEs or dynamic programming recursions, susceptible to the curse of dimensionality. Several computational strategies are employed:
- Direct Numerical Solution of HJB: For low dimensions, methods of characteristics or direct collocation (e.g., ACADO Toolkit) solve the exit-time HJB with boundary/terminal conditions (Yegorov et al., 2019).
- Symbolic (Finite-State) Abstraction: In discrete-time settings with continuous state/control, symbolic abstractions (finite covers of the state space and quantized inputs) yield finite minimax recursions whose solutions upper/lower bound the value function. Algorithmic schemes similar to Dijkstra's algorithm can compute these symbolic value functions and feedbacks, with guaranteed hypo-convergence as abstraction granularity improves (Reissig et al., 2017).
| Numerical Method | Key Features | Reference |
|---|---|---|
| Collocation & characteristics | PDE-based direct value function computation | (Yegorov et al., 2019) |
| Symbolic abstraction | Grid-based finite-state synthesis, minimax recursion | (Reissig et al., 2017) |
| Policy iteration + TT format | SDEs, high dimensions, sample-based TT compression | (Fackeldey et al., 2020) |
- Tensor Train/Policy Iteration for SDEs: For stochastic exit-time problems, approximate policy iteration on polynomial function spaces with Tensor Train (TT) decomposition is effective. Least-squares Monte-Carlo projections enable policy evaluation, and TT format storage mitigates exponential complexity in moderate dimensions. Monte Carlo integration approximates value and gradient calculations efficiently, with demonstrated scalability up to dimension (Fackeldey et al., 2020).
6. Theoretical and Practical Implications
Exit-time relaxed control theory rigorously bridges classical optimal control, measure-valued and randomized policies, and robust feedback synthesis under regularity, nonconvexity, and high-dimensional scenarios. Measure-valued control formulations (Young measures) provide general existence and regularity results even when classical (pointwise) optimal policies fail to exist or are not robust. Regularization by exploration rewards and entropy terms ensures continuity and practical implementability of feedback policies, explaining the robustness of entropy-regularized reinforcement learning heuristics observed empirically (Reisinger et al., 2020).
Constructed value and control functions serve dual roles: as Lyapunov certificates for stabilization and as templates for correct-by-construction symbolic or numerical controllers. The curse of dimensionality is addressed via grid/coarsening refinements, symbolic abstractions, or low-rank function representations; however, the curse of complexity (computational, not just memory) can persist as a limiting factor (Yegorov et al., 2019).
7. Extensions and Open Directions
- Infinite-Horizon and Minimax Formulations: Many results transfer directly to infinite-horizon, reach-avoid, minimum-time, or minimax settings. Non-smooth and singular systems can be handled via auxiliary or perturbed relaxed formulations, with the maximum principle holding at least for some optimal relaxed control (Lou et al., 2013).
- Robust and Adaptive Control: Stability and first-order sensitivity formulae for value and relaxed controls underpin robust and adaptive control design under model uncertainty (Reisinger et al., 2020).
- Algorithmic Enhancements: Multi-level Monte Carlo, control variates, sparse grids, kernel expansions, and neural-network-based policies serve as platforms for further computational improvements in high-dimensional or non-smooth domains (Fackeldey et al., 2020).
Exit time relaxed control theory synthesizes advanced functional analytic tools, stochastic process theory, optimization, and numerical analysis, constituting a foundational framework for modern robust, high-dimensional feedback control synthesis, analysis, and computation across deterministic, stochastic, and non-smooth systems.