Safe Control using Learned Safety Filters and Adaptive Conformal Inference

Published 20 Apr 2026 in eess.SY, cs.LG, and cs.RO | (2604.18482v1)

Abstract: Safety filters have been shown to be effective tools to ensure the safety of control systems with unsafe nominal policies. To address scalability challenges in traditional synthesis methods, learning-based approaches have been proposed for designing safety filters for systems with high-dimensional state and control spaces. However, the inevitable errors in the decisions of these models raise concerns about their reliability and the safety guarantees they offer. This paper presents Adaptive Conformal Filtering (ACoFi), a method that combines learned Hamilton-Jacobi reachability-based safety filters with adaptive conformal inference. Under ACoFi, the filter dynamically adjusts its switching criteria based on the observed errors in its predictions of the safety of actions. The range of possible safety values of the nominal policy's output is used to quantify uncertainty in safety assessment. The filter switches from the nominal policy to the learned safe one when that range suggests it might be unsafe. We show that ACoFi guarantees that the rate of incorrectly quantifying uncertainty in the predicted safety of the nominal policy is asymptotically upper bounded by a user-defined parameter. This gives a soft safety guarantee rather than a hard safety guarantee. We evaluate ACoFi in a Dubins car simulation and a Safety Gymnasium environment, empirically demonstrating that it significantly outperforms the baseline method that uses a fixed switching threshold by achieving higher learned safety values and fewer safety violations, especially in out-of-distribution scenarios.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper proposes Adaptive Conformal Filtering (ACoFi), blending learned HJ reachability safety filters with real-time adaptive conformal inference to manage uncertainty.
The methodology dynamically adjusts safety thresholds based on observed predictive errors, ensuring state constraints while balancing task performance and conservatism.
Empirical evaluations in Dubins car and Safety Gymnasium tasks demonstrate significant reduction in unsafe steps and improved adaptability under distribution shifts.

Safe Control with Learned Safety Filters and Adaptive Conformal Inference: A Technical Overview

Introduction

Reliable deployment of high-dimensional control systems in safety-critical applications requires robust assurance mechanisms under epistemic and aleatoric uncertainty. The work "Safe Control using Learned Safety Filters and Adaptive Conformal Inference" (2604.18482) develops Adaptive Conformal Filtering (ACoFi), a synthesis of learned Hamilton-Jacobi (HJ) reachability-based safety filters and statistically adaptive conformal inference, to address the limits of existing fixed-threshold safety filters and provide probabilistic safety guarantees over the deployment horizon.

Background

Safety Filters and the Limits of Fixed Thresholds

Safety filters based on HJ reachability value functions or Control Barrier Functions (CBFs) adjust nominal (potentially unsafe) actions to enforce state constraint satisfaction. In data-driven settings, these safety value functions are trained over the latent state representation. However, the learned $V_\theta$ may have high predictive error, particularly under distribution shifts, making fixed-threshold heuristics for filter switching unreliable and prone to over- or under-conservatism. This unreliability compounds in high-dimensional latent perception settings.

Existing methods treat the threshold as a hyperparameter, often without account for the region-dependent generalization errors, and thus offer limited or ad hoc safety confidence.

Conformal Prediction and Adaptive Inference

Conformal prediction provides formal, finite-sample statistical guarantees on miscoverage rates for black-box predictors by constructing calibrated confidence regions. For non-exchangeable, temporally dependent data (as in control tasks), Adaptive Conformal Inference (ACI) calibrates coverage online, updating its quantile based on observed errors to guarantee that the long-run empirical miscoverage rate does not exceed a specified $\alpha$ .

Methodology: Adaptive Conformal Filtering (ACoFi)

ACoFi combines a black-box, learned HJ safety filter with real-time, history-dependent calibration using ACI.

Switching Policy

At each time $t$ , letting $y_t$ be the latent state, and assuming a nominal policy $\pi^\mathit{task}$ and a backup HJ-based safety policy $\pi^\mathit{safe}_\theta$ , the Q-value $Q_\theta(y_t, u_t)$ estimates the system's safety when executing $u_t$ at $y_t$ . The system collects the realized target $R_t$ (which blends the classification boundary, future value function estimate, and discount), and computes the conformal score $\alpha$ 0.

At runtime, the ACoFi policy computes the $\alpha$ 1 quantile $\alpha$ 2 of the score history, and forms an adaptive safety threshold:

$\alpha$ 3

where $\alpha$ 4 denotes the latent representation of the classification boundary.

If $\alpha$ 5 exceeds the (adaptively updated) threshold, the nominal action is executed.
Otherwise, control switches to the backup safety policy.

The ACI-driven quantile $\alpha$ 6 is dynamically updated based on the conformal error feedback, ensuring that the filter becomes more conservative in regions with observed mis-predictions and less conservative when the predictor matches observed dynamics.

Guarantees

ACoFi inherits and formalizes asymptotic guarantees from ACI. When applied in closed loop:

The empirical rate at which the system incorrectly quantifies the safety of nominal actions (i.e., $\alpha$ 7 even though the filter did not intervene) is upper bounded by the user-specified $\alpha$ 8.
The threshold adapts to observed epistemic error, allowing for tighter task performance without sacrificing specified statistical safety confidence.

This delivers statistical (soft) guarantees in contrast to hard safety, but provides a direct calibration interface and is robust to changing error modes during deployment.

Empirical Evaluation

ACoFi is evaluated in two tasks: a Dubins car obstacle-avoidance scenario and the high-dimensional Safety Gymnasium CarGoal environment, using vision-based perception and latent dynamics modeling.

Dubins Car Task

The environment consists of a Dubins vehicle with stochastic velocity and steering disturbances at runtime to simulate OOD perturbations. The HJ reachability value function is learned over a latent visual space (using DINO-WM), and ACoFi modulates policy switching based on observed error patterns in OOD deployment.

(Figure 1)

Figure 1: The Dubins car environment, showing the red vehicle, blue obstacles, and green goal zone.

Numerical results show:

ACoFi maintains a higher minimal realized safety value and substantially reduces the fraction of unsafe steps relative to fixed-threshold switching, especially under strong dynamics disturbance.
The reliance on the backup safety policy is commensurate with OOD severity and decreases in-distribution, quantifying the adaptivity of ACoFi.
Fixed threshold methods fail to calibrate to state-dependent error and are both less safe and less sample-efficient.

(Figure 2)

Figure 2: $\alpha$ 9 trajectories for Dubins car agents under the most severe disturbance scenario, comparing fixed threshold and ACoFi switching (green). Circle markers depict moments where the adaptive threshold forces a switch.

Safety Gymnasium CarGoal Task

This task demonstrates ACoFi's behavior on a high-dimensional vision-based RL environment, where HJ value functions are only approximate and OOD generalization error is severe.

ACoFi yields up to 7 $t$ 0 fewer safety violations compared to fixed-threshold switching at the same nominal performance level for realistic $t$ 1.
The trade-off between increased conservatism (i.e., safe control invocations) and reduced total unsafe events can be directly tuned via $t$ 2.

Theoretical and Practical Implications

The principal contribution is a statistically principled, deployment-time adaptive safety filter that calibrates conservatism based on observed errors, eliminating hand-tuned thresholds and offering a tractable protocol for practitioners to manage real-world safety-vs-performance trade-offs under epistemic uncertainty.

The formalization links statistical conformal inference to safety-critical decision making and extends applicability beyond IID data to non-stationary sequential control, assuming ground-truth target values can be observed for calibration.

While only guaranteeing soft asymptotic safety, ACoFi can be layered atop existing safety architectures, mitigating the unreliability from distribution shift and model inaccuracy that typifies end-to-end visually-based control.

Future Directions

Future work may analyze extensions to:

Multi-step prediction error calibration, potentially incorporating time-to-event safety horizons.
Integration with ensemble-based uncertainty quantification and data-driven worst-case reasoning.
Continuous-time system adaptation and extension to multi-agent safety games.
Tighter coupling of the calibration process to alternative perception and dynamics models.

Conclusion

ACoFi constitutes an adaptive and statistically calibrated framework for learned safety filtering in high-dimensional and uncertain environments, providing practitioners a mechanism to manage safety constraints in the presence of unmodeled errors and distribution shift. The adaptive quantile-based thresholding mechanism sets a new standard for safety assurance protocols in data-driven robotics and control, representings a significant step towards robust, real-world-safe intelligent agents.

Markdown Report Issue