Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robust Action Governor (RAG)

Updated 28 January 2026
  • Robust Action Governor (RAG) is a supervisory scheme that enforces safety in uncertain control systems with non-convex state-input constraints.
  • It uses online mixed-integer quadratic programming to adjust nominal controller actions, ensuring recursive feasibility and robust invariance.
  • RAG computes robust invariant sets offline to manage uncertainties, bridging nominal control and safe execution in applications like reinforcement learning.

The Robust Action Governor (RAG) is an add-on supervisory scheme designed to enforce strict safety specifications for systems subject to uncertainties and hard state-input constraints, including non-convex constraints. As an intermediary between a nominal controller—whether classical, model-based, or learned—and the plant, RAG modifies proposed control actions in real time to guarantee recursive, robust, and all-time satisfaction of safety requirements. This includes accommodating both parametric and additive uncertainties prevalent in piecewise affine (PWA) and linear systems, and managing scenarios in reinforcement learning (RL) where unsafe exploration may otherwise occur (Li et al., 2022, Li et al., 2022, Li et al., 2021).

1. System Model and Safety Constraints

RAG is formulated for discrete-time systems with state-dependent mode switching, commonly represented as PWA dynamics with uncertainties. The evolution of system state takes the general form:

xk+1=Aσ(k)(wkp)xk+Bσ(k)(wkp)uk+fσ(k)(wkp)+Eσ(k)(wkp)wka,x_{k+1} = A_{\sigma(k)}(w^p_k)\,x_k + B_{\sigma(k)}(w^p_k)\,u_k + f_{\sigma(k)}(w^p_k) + E_{\sigma(k)}(w^p_k)w^a_k,

where σ(k)\sigma(k) indicates the active mode, wkpw^p_k parameterizes the matrices (with wkpWσ(k)pw^p_k\in W^p_{\sigma(k)} a unit simplex), and wkaWσ(k)aw^a_k\in W^a_{\sigma(k)} is the additive disturbance (bounded polytope). Matrices Aq(wp),Bq(wp),fq(wp),Eq(wp)A_q(w^p), B_q(w^p), f_q(w^p), E_q(w^p) are convex combinations over vertices indexed by jj.

Safety constraints are expressed as general non-convex, pointwise state constraints and polyhedral input constraints:

  • xkX:=i=1r0Xix_k \in X := \bigcup_{i=1}^{r_0} X^i, XiX^i polyhedral;
  • ukUσ(k)u_k \in U_{\sigma(k)}, UqU_q polytope.

This structure accommodates high expressiveness, including constraints that vary with state, mode, and operating region.

2. Robust Control-Invariant Sets and the RAG Principle

At the core of RAG is the robust maximal control-invariant set (the "viability kernel") defined recursively for uncertain, possibly switching, dynamics. The invariant set Ω\Omega_\infty is given as:

Ω={xXuUq:Aq(wp)x+Bq(wp)u+fq(wp)+Eq(wp)waΩ,q,wpWqp,waWqa}\Omega_\infty = \left\{ x \in X \mid \exists u \in U_q : A_q(w^p)x + B_q(w^p)u + f_q(w^p) + E_q(w^p)w^a \in \Omega_\infty,\,\,\forall q, \forall w^p \in W^p_q, \forall w^a \in W^a_q \right\}

Offline, Ω\Omega_\infty is approximated by a decreasing sequence of sets:

  • Ω0=X\Omega_0 = X,
  • Ωk+1=Projx{(x,u):xΩkPq,uUq,Aq,jx+Bq,ju+fq,j+Eq,jWqaΩk,j}\Omega_{k+1} = \mathrm{Proj}_x\{ (x, u) : x\in\Omega_k\cap P_q, u\in U_q, A_{q,j}x + B_{q,j}u + f_{q,j} + E_{q,j}W^a_q \subseteq \Omega_k,\, \forall j \}.

Each Ωk\Omega_k is a (possibly non-convex) union of polyhedra, enabling precise accommodation of complex constraints and uncertainties (Li et al., 2022).

3. Online Optimization: Mixed-Integer Quadratic Programming Formulation

At runtime, RAG operates as a filter between the nominal controller and the actuator. For each timestep kk, it solves:

ukargminuUσ(k)  uunom(k)S2u_k^* \in \arg\min_{u \in U_{\sigma(k)}} \; \|u - u_{\text{nom}}(k)\|_S^2

subject to

Aσ(k),jxk+Bσ(k),ju+fσ(k),j(ΩkiEσ(k),jWσ(k)a),    j,for at least one cell i,A_{\sigma(k),j}x_k + B_{\sigma(k),j}u + f_{\sigma(k),j} \in \left(\Omega_{k'}^i \ominus E_{\sigma(k),j} W^a_{\sigma(k)}\right),\;\; \forall j,\, \text{for at least one cell } i,

where SS is a positive definite weighting matrix and \ominus denotes the Pontryagin difference. Since Ωk\Omega_{k'} is non-convex (union of polyhedra), integer variables and big-M constraints are introduced, resulting in a mixed-integer quadratic program (MIQP). This MIQP is solved efficiently online (typical runtime: 15-30 ms per step on modern CPUs) using solvers such as Gurobi or SCIP (Li et al., 2022, Li et al., 2021).

4. Theoretical Properties and Robustness Guarantees

RAG ensures:

  • Recursive Feasibility: If the MIQP is feasible at step kk with xkΩkx_k\in\Omega_{k'}, the resulting uku_k ensures xk+1Ωkx_{k+1}\in\Omega_{k'} for all allowable uncertainties; hence the MIQP remains feasible for subsequent kk.
  • Robust Safety: All trajectories satisfy xkXx_k\in X for all kk, across all admissible realizations of wkpw^p_k and wkaw^a_k.
  • Safe Set Convergence: The set sequence Ωk+1Ωk\Omega_{k+1}\subseteq\Omega_k converges to Ω\Omega_\infty, proven robustly invariant under mild compactness assumptions (Li et al., 2022, Li et al., 2022).

The supervisor's action corrections trade a margin of performance for guaranteed invariance, with larger uncertainties inducing more conservative feasible sets.

5. Integration with Safe Reinforcement Learning

RAG enables safe RL by robustly decoupling safety enforcement from exploratory policy learning. The process proceeds as follows:

  1. The RL agent observes xkx_k and proposes a candidate action unom(k)u_{\text{nom}}(k) (potentially through ε-greedy selection over its Q-function).
  2. RAG filters unom(k)u_{\text{nom}}(k) to produce uku^*_k via the MIQP, ensuring one-step (and recursive) safety.
  3. The plant receives uku^*_k, reward RkR_k is observed, and the RL agent updates its value function as if unom(k)u_{\text{nom}}(k) had been applied, ensuring undisturbed learning.
  4. All-time constraint violations are precluded throughout both exploration and exploitation phases (Li et al., 2022, Li et al., 2021).

To further reduce online computational demands, an explicit safe policy u^=π^(x)\hat{u} = \hat{\pi}(x) can be obtained by imitation learning from RAG-filtered offline data, allowing near-instantaneous (e.g., 0.5 ms per step) action evaluation with minor safety approximation error.

6. Computational Approach

Offline Phase: The robust invariant set computations utilize polyhedral operations—Minkowski sums, Pontryagin differences, intersections, and projections—implemented in toolboxes such as MPT3. For PWA systems and non-convex XX, vertex enumeration is employed for universal quantification over uncertainties.

Online Phase: MIQP solution for action correction is performed at every step. For moderate system dimensions and sampling rates (e.g., ≤20 ms per MIQP), RAG is compatible with real-time operation in practical safety-critical control loops (Li et al., 2022, Li et al., 2021).

7. Example Applications and Performance

RAG has been evaluated on PWA models, such as a mass-spring-damper system with uncertain mass and adversarial disturbances. In soft-landing tasks with non-convex velocity-position safety regions and input force constraints, RAG achieves a violation rate of zero over 500 disturbance trials under adversarial injection, compared to frequent violations from a nominal RL controller.

In RL-driven adaptation scenarios (e.g., shifting system parameters (m,d)(m,d)), RAG-based safe RL maintains zero constraint violations from the first episode, whereas conventional RL may require hundreds of episodes and still suffer occasional violations. When deploying learned explicit safe policies via imitation learning, it is possible to obtain real-time safe control at >95% faster per-step compute time, tolerating negligible or minor constraint relaxation (Li et al., 2022).

In automotive adaptive cruise control, RAG strictly enforces distance and actuator limits both during training and deployment, yielding zero violations and faster RL convergence—demonstrating the general applicability of the approach (Li et al., 2021).


References

  • "Robust Action Governor for Uncertain Piecewise Affine Systems with Non-convex Constraints and Safe Reinforcement Learning" (Li et al., 2022)
  • "Safe Control and Learning Using the Generalized Action Governor" (Li et al., 2022)
  • "Safe Reinforcement Learning Using Robust Action Governor" (Li et al., 2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Robust Action Governor (RAG).