V-OCBF: Value-Guided Offline Control Barrier Functions

Updated 18 December 2025

The paper introduces V-OCBF, which extracts safety certificates by deriving control barrier functions from offline value functions using neural parameterization.
It employs expectile-based loss and Lipschitz regularization to prevent overestimation in out-of-distribution regions and to ensure robust safety guarantees.
The approach synthesizes safe controllers via quadratic programs, demonstrating reduced safety-violation rates and improved reliability in high-dimensional control tasks.

Value-Guided Offline Control Barrier Functions (V-OCBF) formalize a framework for learning safety certificates and synthesizing safe controllers using only offline data without access to system dynamics or online rollouts. V-OCBFs enable the construction of neural control barrier functions (CBFs) that propagate and verify state-wise forward invariance guarantees based on value-function principles, extending classic CBF theory to settings with limited model information and substantial offline data. These methods are particularly effective in safety-critical autonomous systems where strict hard-constraint satisfaction is necessary and online experimentation is impractical or unsafe (Tayal et al., 11 Dec 2025, Tan et al., 2023, Tabbara et al., 1 May 2025, Choi et al., 2021, Almubarak et al., 2021).

1. Mathematical Foundation and Problem Formulation

The V-OCBF framework presumes safety-critical control-affine systems governed by

$\dot{x}(t)=f(x(t))+g(x(t))u(t),$

with $x\in\mathbb{R}^n$ , $u\in U \subset\mathbb{R}^m$ . Safety is encoded via a continuously differentiable barrier $h:\mathbb{R}^n\to\mathbb{R}$ , defining the safe set

$\mathcal{C} = \{x\in\mathbb{R}^n : h(x)\geq0 \}.$

A function $h$ is a control barrier function if for all $x\in\mathcal{C}$ , there exists $u\in U$ such that

$L_f h(x) + L_g h(x)u + \alpha(h(x)) \geq 0,$

where $L_f h(x)=\nabla h(x)^{\top}f(x)$ , $L_g h(x)=\nabla h(x)^{\top}g(x)$ , and $\alpha$ is a class- $\mathcal{K}$ function. Satisfaction of this CBF condition by a feedback law ensures forward invariance of $\mathcal{C}$ .

V-OCBFs incorporate value-function principles by embedding safety penalties into an infinite-horizon cost functional: $J(u;x_0)=\int_0^\infty\left\{\ell(x(t),u(t))+\rho\|\max\{0,-h(x(t))\}\|^2\right\}dt,$ with $\ell\geq 0$ the nominal stage cost and $\rho>0$ the safety-violation penalty (Tayal et al., 11 Dec 2025, Cohen et al., 2020, Almubarak et al., 2021).

2. Value-Function–Driven Barrier Construction

V-OCBF methods use offline-learned value functions to induce barrier functions, drawing on the connection between safety-optimal value functions and CBFs. In discrete time, for any offline RL-trained $V$ , the candidate barrier

$h(x) = V(x) - R$

(with $R$ a suitably chosen threshold) can act as a CBF if it satisfies two central conditions:

$h(x)<0$ for all $x$ in the known unsafe set;
$\forall x$ with $h(x)\geq 0$ , $\sup_{u} h(f(x,u))\geq (1-\alpha)h(x)$ for some $\alpha\in(0,1]$ .

This Bellman-style one-step guarantee enables direct extraction of CBF-like certificates from Q-learning or policy evaluation outputs, requiring only dataset-supported state transitions (Tan et al., 2023, Tayal et al., 11 Dec 2025).

The barrier is updated recursively via finite-difference or empirical Bellman updates: $\hat{h}(x) \leftarrow \max_{u\in \mathcal{D}(x)} \mathbb{E}_{x'}[h(x')] + \text{regularization},$ with the data-driven action support $\mathcal{D}(x)$ restricting consideration to actions/forms observed in the offline dataset (Tayal et al., 11 Dec 2025).

3. Model-Free Neural Barrier Learning and OOD Robustness

A distinguishing feature of V-OCBF is the model-free neural parameterization of the barrier $h_\phi$ : $h_\phi = \text{NN}_\phi(x),$ trained to satisfy barrier conditions using only offline transition tuples $(x,u,x')$ . V-OCBF circumvents the need for a known $f,g$ by leveraging difference quotients or learned local predictors for the barrier evolution.

The loss incorporates an expectile-based term to prevent barrier overestimation on out-of-distribution (OOD) actions: $\mathcal{L}_{\text{expectile}} = E_{(x,u,x')} \left[\rho_\tau\left(h_\phi(x) - (h_\phi(x') + c)\right)\right],$ where $\rho_\tau$ is the asymmetric expectile loss, $c$ encodes the required barrier growth, and $u$ is sampled only from actions represented in the dataset. This conservatism sharply improves safety reliability in OOD regions, a documented weakness of prior neural CBF learning (Tabbara et al., 1 May 2025, Tayal et al., 11 Dec 2025).

Neural barrier models are also regularized by Lipschitz and data consistency terms. The empirical loss includes penalties designed to depress $h_\phi$ on OOD next states reachable by randomly sampled actions, preventing unsafe or unsupported generalization (Tabbara et al., 1 May 2025).

4. Safe Controller Synthesis via Quadratic Programs

At deployment, the learned barrier $h_\phi$ serves as a real-time safety filter within a quadratic program (QP): $u^* =\arg\min_{u\in U} \|u-u_{\text{ref}}(x)\|^2 \quad \text{s.t.} \quad L_f h_\phi(x) + L_g h_\phi(x) u + \alpha(h_\phi(x))\geq 0,$ where $u_{\text{ref}}(x)$ is an unconstrained reference controller (e.g., RL or model-based policy). This online QP ensures safety constraint feasibility at every step, yielding forward invariance with minimal task performance degradation (Tayal et al., 11 Dec 2025, Almubarak et al., 2021, Choi et al., 2021).

Controllers synthesized by V-OCBF consistently reduce empirical safety-violation rates compared to baseline CBF, minimum-norm, or unconstrained controllers, and maintain competitive normal task costs (Tayal et al., 11 Dec 2025, Almubarak et al., 2021). These properties hold across continuous-control benchmarks and high-dimensional state representations.

5. Verification and Empirical Evaluation

Formal analysis of the V-OCBF method demonstrates that, under mild assumptions on value approximation error, one obtains high-probability guarantees that the learned $h_\phi$ meets CBF conditions on the data manifold (Tan et al., 2023). Validity and coverage are quantified via metrics:

Validity: $m_{\text{valid}}(h) = \mathbb{E}_x[\rho_1(x)\cdot\rho_2(x)]$ , with indicator functions encoding the two core CBF criteria.
Coverage: $m_{\text{cov}}(h) = \mathbb{E}_x\left[\mathbf{1}{ \{h(x)\ge 0\} }\right]$ .

Empirical results show that as $m_{\text{valid}}\to 1$ , $m_{\text{cov}}$ may decrease, corresponding to a validity-coverage trade-off: stricter validity shrinks the declared safe set but increases confidence in forward invariance (Tan et al., 2023).

Comparative studies indicate that V-OCBF methods, especially with expectile/OOD regularization, achieve lower safety-violation rates and higher robustness than unconstrained RL, hand-designed CBF filters, or earlier neural CBF learning baselines (Tabbara et al., 1 May 2025, Tayal et al., 11 Dec 2025).

Value-guided safety certificate extraction situates V-OCBFs at the intersection of classic HJ reachability, reinforcement learning, and modern neural approximation. Whereas traditional CBF approaches require manual construction of $h$ or explicit models $f,g$ , and HJ reachability solves PDEs at substantial computational cost, V-OCBF achieves comparable forward invariance via neural, data-driven synthesis from potentially high-dimensional, partially observed datasets (Choi et al., 2021, Tan et al., 2023).

Contrast with "Conservative Control Barrier Functions" (CCBF) shows both methods use regularization to depress the barrier off the data manifold, but V-OCBF's recursion and expectile loss specifically enforce safety propagation along the dataset support, ensuring OOD robustness without excessive conservatism (Tabbara et al., 1 May 2025, Tayal et al., 11 Dec 2025).

7. Limitations and Scalability

Key limitations stem from the reliance on dataset coverage: if offline data do not adequately represent the region of operational relevance, the learned barrier may be unreliable. While Lipschitz and OOD-penalty terms mitigate this, V-OCBFs do not guarantee worst-case safety off-manifold in the absence of sufficient data (Tabbara et al., 1 May 2025). Nonetheless, experiments demonstrate strong empirical safety profile and scalability to challenging control domains (e.g., MuJoCo locomotion, high-dimensional visual inputs) (Tayal et al., 11 Dec 2025, Tan et al., 2023).

The approach does not require online fine-tuning or interaction, making it suitable for safety-critical settings where exploration is infeasible. A plausible implication is that further improvements may focus on joint active data selection, tighter OOD detection, and integration with certified Lipschitz neural architectures.

References

"V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions" (Tayal et al., 11 Dec 2025)
"Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory" (Tan et al., 2023)
"Learning Neural Control Barrier Functions from Offline Data with Conservatism" (Tabbara et al., 1 May 2025)
"Robust Control Barrier-Value Functions for Safety-Critical Control" (Choi et al., 2021)
"HJB Based Optimal Safe Control Using Control Barrier Functions" (Almubarak et al., 2021)
"Approximate Optimal Control for Safety-Critical Systems with Control Barrier Functions" (Cohen et al., 2020)