Robust Action Governor (RAG)
- Robust Action Governor (RAG) is a supervisory scheme that enforces safety in uncertain control systems with non-convex state-input constraints.
- It uses online mixed-integer quadratic programming to adjust nominal controller actions, ensuring recursive feasibility and robust invariance.
- RAG computes robust invariant sets offline to manage uncertainties, bridging nominal control and safe execution in applications like reinforcement learning.
The Robust Action Governor (RAG) is an add-on supervisory scheme designed to enforce strict safety specifications for systems subject to uncertainties and hard state-input constraints, including non-convex constraints. As an intermediary between a nominal controller—whether classical, model-based, or learned—and the plant, RAG modifies proposed control actions in real time to guarantee recursive, robust, and all-time satisfaction of safety requirements. This includes accommodating both parametric and additive uncertainties prevalent in piecewise affine (PWA) and linear systems, and managing scenarios in reinforcement learning (RL) where unsafe exploration may otherwise occur (Li et al., 2022, Li et al., 2022, Li et al., 2021).
1. System Model and Safety Constraints
RAG is formulated for discrete-time systems with state-dependent mode switching, commonly represented as PWA dynamics with uncertainties. The evolution of system state takes the general form:
where indicates the active mode, parameterizes the matrices (with a unit simplex), and is the additive disturbance (bounded polytope). Matrices are convex combinations over vertices indexed by .
Safety constraints are expressed as general non-convex, pointwise state constraints and polyhedral input constraints:
- , polyhedral;
- , polytope.
This structure accommodates high expressiveness, including constraints that vary with state, mode, and operating region.
2. Robust Control-Invariant Sets and the RAG Principle
At the core of RAG is the robust maximal control-invariant set (the "viability kernel") defined recursively for uncertain, possibly switching, dynamics. The invariant set is given as:
Offline, is approximated by a decreasing sequence of sets:
- ,
- .
Each is a (possibly non-convex) union of polyhedra, enabling precise accommodation of complex constraints and uncertainties (Li et al., 2022).
3. Online Optimization: Mixed-Integer Quadratic Programming Formulation
At runtime, RAG operates as a filter between the nominal controller and the actuator. For each timestep , it solves:
subject to
where is a positive definite weighting matrix and denotes the Pontryagin difference. Since is non-convex (union of polyhedra), integer variables and big-M constraints are introduced, resulting in a mixed-integer quadratic program (MIQP). This MIQP is solved efficiently online (typical runtime: 15-30 ms per step on modern CPUs) using solvers such as Gurobi or SCIP (Li et al., 2022, Li et al., 2021).
4. Theoretical Properties and Robustness Guarantees
RAG ensures:
- Recursive Feasibility: If the MIQP is feasible at step with , the resulting ensures for all allowable uncertainties; hence the MIQP remains feasible for subsequent .
- Robust Safety: All trajectories satisfy for all , across all admissible realizations of and .
- Safe Set Convergence: The set sequence converges to , proven robustly invariant under mild compactness assumptions (Li et al., 2022, Li et al., 2022).
The supervisor's action corrections trade a margin of performance for guaranteed invariance, with larger uncertainties inducing more conservative feasible sets.
5. Integration with Safe Reinforcement Learning
RAG enables safe RL by robustly decoupling safety enforcement from exploratory policy learning. The process proceeds as follows:
- The RL agent observes and proposes a candidate action (potentially through ε-greedy selection over its Q-function).
- RAG filters to produce via the MIQP, ensuring one-step (and recursive) safety.
- The plant receives , reward is observed, and the RL agent updates its value function as if had been applied, ensuring undisturbed learning.
- All-time constraint violations are precluded throughout both exploration and exploitation phases (Li et al., 2022, Li et al., 2021).
To further reduce online computational demands, an explicit safe policy can be obtained by imitation learning from RAG-filtered offline data, allowing near-instantaneous (e.g., 0.5 ms per step) action evaluation with minor safety approximation error.
6. Computational Approach
Offline Phase: The robust invariant set computations utilize polyhedral operations—Minkowski sums, Pontryagin differences, intersections, and projections—implemented in toolboxes such as MPT3. For PWA systems and non-convex , vertex enumeration is employed for universal quantification over uncertainties.
Online Phase: MIQP solution for action correction is performed at every step. For moderate system dimensions and sampling rates (e.g., ≤20 ms per MIQP), RAG is compatible with real-time operation in practical safety-critical control loops (Li et al., 2022, Li et al., 2021).
7. Example Applications and Performance
RAG has been evaluated on PWA models, such as a mass-spring-damper system with uncertain mass and adversarial disturbances. In soft-landing tasks with non-convex velocity-position safety regions and input force constraints, RAG achieves a violation rate of zero over 500 disturbance trials under adversarial injection, compared to frequent violations from a nominal RL controller.
In RL-driven adaptation scenarios (e.g., shifting system parameters ), RAG-based safe RL maintains zero constraint violations from the first episode, whereas conventional RL may require hundreds of episodes and still suffer occasional violations. When deploying learned explicit safe policies via imitation learning, it is possible to obtain real-time safe control at >95% faster per-step compute time, tolerating negligible or minor constraint relaxation (Li et al., 2022).
In automotive adaptive cruise control, RAG strictly enforces distance and actuator limits both during training and deployment, yielding zero violations and faster RL convergence—demonstrating the general applicability of the approach (Li et al., 2021).
References
- "Robust Action Governor for Uncertain Piecewise Affine Systems with Non-convex Constraints and Safe Reinforcement Learning" (Li et al., 2022)
- "Safe Control and Learning Using the Generalized Action Governor" (Li et al., 2022)
- "Safe Reinforcement Learning Using Robust Action Governor" (Li et al., 2021)