Safety Filtering

Updated 26 January 2026

Safety filtering is the process of vetting control actions to ensure that systems remain within certified safe sets using formal safety specifications and real-time interventions.
Modern methodologies employ control barrier functions, Hamilton–Jacobi reachability, and robust optimization to minimally modify actions while guaranteeing safety under uncertainty.
Integration with reinforcement learning and simulation-based approaches enhances system robustness, ensuring sample-efficient training and minimal safety violations in practical applications.

Safety filtering is the algorithmic process by which control inputs, decisions, or outputs—whether generated by human agents, autonomous controllers, or learned policies—are vetted, modified, or overridden in real time to ensure compliance with formal safety specifications. Safety filters impose hard or probabilistic guarantees that the system will remain within a certified safe set, even amid uncertainty, perception limits, or adversarial conditions. Modern realizations span control barrier functions, Hamilton–Jacobi reachability, distributionally robust optimization, model-predictive safety, and data-driven approaches. Safety filtering is operationalized as an intervention layer between an unverified or performance-oriented "primary" controller and the plant or environment, minimally altering decisions only when necessary to guarantee safety.

1. Formal Definitions and Filtering Logic

Safety filtering can be formalized across continuous, discrete, deterministic, and stochastic systems:

Run Time Assurance (RTA): At each timestep, a monitor evaluates the proposed control action $u_p$ given the current state $x$ . If $u_p$ maintains safety, it is executed; otherwise, a pre-verified backup controller $u_b(x)$ takes over (Hobbs et al., 2021).
Safety-Critical MDPs: In safety-critical Markov decision processes (SC-MDPs), a filter $\phi$ maps any proposed action to the maximal safe set, yielding a filtered policy that never violates categorical safety constraints (Oh et al., 20 Oct 2025).
Control Barrier Function (CBF) Filters: CBFs encode forward invariance via inequalities such as $L_f h(x) + L_g h(x)\,u + \alpha(h(x)) \ge 0$ . A safety filter solves a quadratic program projecting the proposed action onto the set of feasible (safe) actions (Hobbs et al., 2021, Smaili et al., 18 Dec 2025).
Hamilton-Jacobi (HJ) Reachability Filters: Based on the value function $V(x)$ solving an HJI variational inequality, the filter restricts actions to those for which $\nabla V(x) \cdot f(x,u) \ge 0$ , ensuring infinitesimal step-wise safety (Borquez et al., 2023).

Filtering logic may be either switching (hard override to backup policy or minimal control) or projection-based (solving optimization to minimally alter control, often via QPs).

2. Key Methodologies and Design Principles

Contemporary safety filters employ diverse methodologies:

Explicit and Implicit CBF-QP Filtering: Explicit filters solve QPs at each timestep; implicit filters use reachability or backup simulations to characterize safe operational regions (Hobbs et al., 2021, Hailemichael et al., 2023).
Distributionally Robust Optimization (DRO): Motion planning under uncertainty uses sample-based predictions and constructs safe halfspaces via DRO with Conditional Value-at-Risk (CVaR) metrics. For each obstacle, safety is enforced by bounding the worst-case risk in a Wasserstein-ball ambiguity set (Safaoui et al., 2023).
Spectral Safety Filtering: EigenSafe learns spectral certificates (dominant eigenfunctions of the Bellman safety operator), using threshold-switching between reference and backup policies; it is suited for stochastic systems where classical reachability collapses the safe set (Jang et al., 22 Sep 2025).
Physics-based Simulation Filters: Manipulation under parameter uncertainty combines dense nominal rollouts with sparse, parallelized evaluation at critical states, using generalized factor-of-safety metrics and Monte Carlo integration for risk assessment (Johansson et al., 16 Sep 2025).
Perception-limited and Smooth Filtering: With limited sensing, filters gate the activation of safety constraints, employing differentiable perception gates or penalty-based relaxation to yield smooth, high-order compatible safety actions (Smaili et al., 18 Dec 2025).
Latent-space Filtering: For high-dimensional observations (e.g., images), filters operate in learned latent spaces, adapting HJ reachability and CBF concepts; the quality of the latent margin function (smoothness, calibration) critically affects filter efficacy (Nakamura et al., 23 Nov 2025).

Table: Core Filtering Schemes (Editor’s term)

Filtering Paradigm	Core Guarantee	Representative Equation/Logic
CBF-QP (explicit)	Forward invariance	$L_f h(x) + L_g h(x)\,u + \alpha(h(x)) \ge 0$
HJ Reachability	Maximal safe set	$\nabla V(x) \cdot f(x,u) \ge 0$ , $V(x)>0$
DRO-based (CVaR)	Prob. risk threshold	$\mathrm{DR{-}CVaR}^\epsilon_\alpha(\ell(\cdot)) \le \delta$
Spectral (EigenSafe)	Safety probability	$T_\pi[\phi](x) = \lambda \phi(x)$ , $\phi(x) > \epsilon$
Physics-based	Parametric FOS	$S_c = \int p(\theta) g(z_{c+1}(\theta)) d\theta$

3. Integration with Learning-Based Policies and RL

Safety filters are increasingly intertwined with reinforcement learning and learned controllers:

Plug-and-Play Models: Model-free filters learned via Q-learning use a safety Q-function to filter nominal actions via thresholding, without requiring system models; theoretical results guarantee forward invariance under optimality (Sue et al., 2024).
Training-Time Safety Filtering: Embedding the safety filter at training—not only at deployment—enables the RL policy to adapt to the certified filter, improving sample efficiency, reducing chattering, and maintaining hard safety guarantees (Bejarano et al., 2024).
Permissive Filtering: Theorem established in (Oh et al., 20 Oct 2025) proves that a maximally permissive safety filter allows RL agents to achieve the same asymptotic performance as unconstrained learning, provided all unsafe actions are simply overridden or projected.
CBF-RL and LatentCBF: Enforcing formal CBF constraints in RL rollouts or in learned latent spaces leads to policies that internalize safety, enabling deployment without online filters and supporting high-dimensional and visuomotor tasks (Yang et al., 16 Oct 2025, Nakamura et al., 23 Nov 2025).

4. Extensions: Perception, Uncertainty, and Semantics

Modern safety filters adapt to real-world perception constraints, environmental uncertainty, and semantic grounding:

Poisson-based Safety Functions: Solving Poisson’s equation on an occupancy map yields globally smooth safety sets and gradient fields used for CBF construction and online filtering; performance validated on cluttered hardware (Bahati et al., 11 May 2025).
Path-Consistent Filtering: For diffusion policies, path-consistent braking uses trajectory-based reachability checks, ensuring safe deployment without “warping” off policy-consistent paths (Römer et al., 9 Nov 2025).
Language-Conditioned Filtering: Safety constraints derived from natural language are parsed via LLMs into machine-readable specifications, grounded with perception modules and enforced in real time via MPC-based safety filters (Feng et al., 8 Nov 2025).
Content Safety in LLMs: In LLMs, safety filtering spans input and output stages, leveraging classifiers, adversarial detectors, and context-aware moderation systems to dramatically reduce jailbreak attack success rates (Xin et al., 30 Dec 2025). CultureGuard extends content safety to multilingual and culturally distinct datasets and filters via a hierarchical, adaptation+translation+filtering pipeline (Joshi et al., 3 Aug 2025).
Manipulation Under Uncertainty: Physics-based safety filters leverage high-fidelity simulation and sparse MC evaluation to robustly filter actions in uncertain environments, with a scalable pipeline amenable to real-world robotic manipulation (Johansson et al., 16 Sep 2025).

5. Theoretical Guarantees and Empirical Validation

Safety filtering frameworks offer strong mathematical guarantees and exhibit robust empirical performance:

Forward Invariance: Classical CBFs and penalty-based smooth filters guarantee that the system state remains inside the safe set for all time via Nagumo's theorem and forward invariance principles (Smaili et al., 18 Dec 2025).
Probabilistic Guarantees: DRO-based filters provide explicit risk bounds (CVaR, Wasserstein-ball) on collision probability, with LP reformulations enabling real-time tractability (Safaoui et al., 2023).
Spectral Safety Probability: EigenSafe certifies the asymptotic safety probability via dominant operator eigenpairs, supporting stochastic processes and adaptive thresholding (Jang et al., 22 Sep 2025).
Sample Efficiency and Performance: RL agents trained with online safety filtering learn certified behaviors more efficiently, avoid training-time and deployment violations, and match or exceed the performance of unconstrained RL (Bejarano et al., 2024, Oh et al., 20 Oct 2025, Yang et al., 16 Oct 2025).
Empirical Benchmarks: Across tasks—adaptive cruise control (Hailemichael et al., 2023), high-dimensional manipulation (Nakamura et al., 23 Nov 2025), quadruped and humanoid navigation (Bahati et al., 11 May 2025, Yang et al., 16 Oct 2025)—safety filters yield zero or near-zero violation rates, minimal control corrections, and competitive or superior task completion rates.

6. Limitations, Open Challenges, and Future Directions

Safety filtering is subject to domain-specific constraints and ongoing research avenues:

Approximation Quality: Model-free Q-function or learned latent margin functions require sufficient training and regularization to guarantee safety; suboptimal functions can admit violations (Sue et al., 2024, Nakamura et al., 23 Nov 2025).
Smoothness vs. Reactivity: Classical barrier-based filters may exhibit nonsmooth switching; smooth perception-gate and penalty-based relaxations resolve this but may trade off reactivity near constraint boundaries (Smaili et al., 18 Dec 2025).
Multilingual and Semantic Filtering: As safety-critical LLMs proliferate, explicit attention to cultural context, cross-lingual adaptation, and semantic disambiguation is necessary to avoid high false-positive/negative rates (Xin et al., 30 Dec 2025, Joshi et al., 3 Aug 2025).
Safe Exploration vs. Conservatism: Over-conservative filters can hamper performance; permissive filters require accurate safe set identification, especially in high-dimensional or unknown environments (Oh et al., 20 Oct 2025).
Data Coverage and Uncertainty: In physics-based and spectral filters, comprehensive exploration near safety boundaries and robust uncertainty quantification are essential for accuracy and practical performance (Johansson et al., 16 Sep 2025, Jang et al., 22 Sep 2025).

Future work targets adaptive filter tuning, hardware deployments, semantic understanding, and scalable filtering in complex, stochastic, and partially observable domains (Safaoui et al., 2023, Smaili et al., 18 Dec 2025, Joshi et al., 3 Aug 2025, Nakamura et al., 23 Nov 2025). There is ongoing interest in integrating multi-turn conversational history in LLM safety, data-driven robustification in RL-safe sets, and algorithmic harmonization between performance and formal safety guarantees.

Markdown Upgrade to Chat

References (17)

Run Time Assurance for Safety-Critical Systems: An Introduction to Safety Filtering Approaches for Complex Control Systems (2021)

Provably Optimal Reinforcement Learning under Safety Filtering (2025)

Perception-Limited Smooth Safety Filtering (2025)

On Safety and Liveness Filtering Using Hamilton-Jacobi Reachability Analysis (2023)

Safety Filtering for Reinforcement Learning-based Adaptive Cruise Control (2023)

Distributionally Robust CVaR-Based Safety Filtering for Motion Planning in Uncertain Environments (2023)

EigenSafe: A Spectral Framework for Learning-Based Stochastic Safety Filtering (2025)

Safety filtering of robotic manipulation under environment uncertainty: a computational approach (2025)

How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints (2025)

10.

Q-learning-based Model-free Safety Filter (2024)

11.

Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents (2024)

12.

CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions (2025)

13.

Dynamic Safety in Complex Environments: Synthesizing Safety Filters with Poisson's Equation (2025)

14.

From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies (2025)

15.

From Words to Safety: Language-Conditioned Safety Filtering for Robot Navigation (2025)

16.

Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race? (2025)

17.

CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Safety Filtering.