- The paper proposes Adaptive Conformal Filtering (ACoFi), blending learned HJ reachability safety filters with real-time adaptive conformal inference to manage uncertainty.
- The methodology dynamically adjusts safety thresholds based on observed predictive errors, ensuring state constraints while balancing task performance and conservatism.
- Empirical evaluations in Dubins car and Safety Gymnasium tasks demonstrate significant reduction in unsafe steps and improved adaptability under distribution shifts.
Introduction
Reliable deployment of high-dimensional control systems in safety-critical applications requires robust assurance mechanisms under epistemic and aleatoric uncertainty. The work "Safe Control using Learned Safety Filters and Adaptive Conformal Inference" (2604.18482) develops Adaptive Conformal Filtering (ACoFi), a synthesis of learned Hamilton-Jacobi (HJ) reachability-based safety filters and statistically adaptive conformal inference, to address the limits of existing fixed-threshold safety filters and provide probabilistic safety guarantees over the deployment horizon.
Background
Safety Filters and the Limits of Fixed Thresholds
Safety filters based on HJ reachability value functions or Control Barrier Functions (CBFs) adjust nominal (potentially unsafe) actions to enforce state constraint satisfaction. In data-driven settings, these safety value functions are trained over the latent state representation. However, the learned Vθ​ may have high predictive error, particularly under distribution shifts, making fixed-threshold heuristics for filter switching unreliable and prone to over- or under-conservatism. This unreliability compounds in high-dimensional latent perception settings.
Existing methods treat the threshold as a hyperparameter, often without account for the region-dependent generalization errors, and thus offer limited or ad hoc safety confidence.
Conformal prediction provides formal, finite-sample statistical guarantees on miscoverage rates for black-box predictors by constructing calibrated confidence regions. For non-exchangeable, temporally dependent data (as in control tasks), Adaptive Conformal Inference (ACI) calibrates coverage online, updating its quantile based on observed errors to guarantee that the long-run empirical miscoverage rate does not exceed a specified α.
ACoFi combines a black-box, learned HJ safety filter with real-time, history-dependent calibration using ACI.
Switching Policy
At each time t, letting yt​ be the latent state, and assuming a nominal policy πtask and a backup HJ-based safety policy πθsafe​, the Q-value Qθ​(yt​,ut​) estimates the system's safety when executing ut​ at yt​. The system collects the realized target Rt​ (which blends the classification boundary, future value function estimate, and discount), and computes the conformal score α0.
At runtime, the ACoFi policy computes the α1 quantile α2 of the score history, and forms an adaptive safety threshold:
α3
where α4 denotes the latent representation of the classification boundary.
- If α5 exceeds the (adaptively updated) threshold, the nominal action is executed.
- Otherwise, control switches to the backup safety policy.
The ACI-driven quantile α6 is dynamically updated based on the conformal error feedback, ensuring that the filter becomes more conservative in regions with observed mis-predictions and less conservative when the predictor matches observed dynamics.
Guarantees
ACoFi inherits and formalizes asymptotic guarantees from ACI. When applied in closed loop:
- The empirical rate at which the system incorrectly quantifies the safety of nominal actions (i.e., α7 even though the filter did not intervene) is upper bounded by the user-specified α8.
- The threshold adapts to observed epistemic error, allowing for tighter task performance without sacrificing specified statistical safety confidence.
This delivers statistical (soft) guarantees in contrast to hard safety, but provides a direct calibration interface and is robust to changing error modes during deployment.
Empirical Evaluation
ACoFi is evaluated in two tasks: a Dubins car obstacle-avoidance scenario and the high-dimensional Safety Gymnasium CarGoal environment, using vision-based perception and latent dynamics modeling.
Dubins Car Task
The environment consists of a Dubins vehicle with stochastic velocity and steering disturbances at runtime to simulate OOD perturbations. The HJ reachability value function is learned over a latent visual space (using DINO-WM), and ACoFi modulates policy switching based on observed error patterns in OOD deployment.
(Figure 1)
Figure 1: The Dubins car environment, showing the red vehicle, blue obstacles, and green goal zone.
Numerical results show:
- ACoFi maintains a higher minimal realized safety value and substantially reduces the fraction of unsafe steps relative to fixed-threshold switching, especially under strong dynamics disturbance.
- The reliance on the backup safety policy is commensurate with OOD severity and decreases in-distribution, quantifying the adaptivity of ACoFi.
- Fixed threshold methods fail to calibrate to state-dependent error and are both less safe and less sample-efficient.
(Figure 2)
Figure 2: α9 trajectories for Dubins car agents under the most severe disturbance scenario, comparing fixed threshold and ACoFi switching (green). Circle markers depict moments where the adaptive threshold forces a switch.
Safety Gymnasium CarGoal Task
This task demonstrates ACoFi's behavior on a high-dimensional vision-based RL environment, where HJ value functions are only approximate and OOD generalization error is severe.
- ACoFi yields up to 7t0 fewer safety violations compared to fixed-threshold switching at the same nominal performance level for realistic t1.
- The trade-off between increased conservatism (i.e., safe control invocations) and reduced total unsafe events can be directly tuned via t2.
Theoretical and Practical Implications
The principal contribution is a statistically principled, deployment-time adaptive safety filter that calibrates conservatism based on observed errors, eliminating hand-tuned thresholds and offering a tractable protocol for practitioners to manage real-world safety-vs-performance trade-offs under epistemic uncertainty.
The formalization links statistical conformal inference to safety-critical decision making and extends applicability beyond IID data to non-stationary sequential control, assuming ground-truth target values can be observed for calibration.
While only guaranteeing soft asymptotic safety, ACoFi can be layered atop existing safety architectures, mitigating the unreliability from distribution shift and model inaccuracy that typifies end-to-end visually-based control.
Future Directions
Future work may analyze extensions to:
- Multi-step prediction error calibration, potentially incorporating time-to-event safety horizons.
- Integration with ensemble-based uncertainty quantification and data-driven worst-case reasoning.
- Continuous-time system adaptation and extension to multi-agent safety games.
- Tighter coupling of the calibration process to alternative perception and dynamics models.
Conclusion
ACoFi constitutes an adaptive and statistically calibrated framework for learned safety filtering in high-dimensional and uncertain environments, providing practitioners a mechanism to manage safety constraints in the presence of unmodeled errors and distribution shift. The adaptive quantile-based thresholding mechanism sets a new standard for safety assurance protocols in data-driven robotics and control, representings a significant step towards robust, real-world-safe intelligent agents.