Correct-by-Construction Control Learning

Updated 1 March 2026

Correct-by-construction control learning is a framework that integrates formal verification, reachability analysis, and barrier functions to ensure controllers meet safety and temporal logic specifications.
By embedding verification checks and differentiable safety layers into the optimization process, the approach guarantees that control policies remain safe throughout both training and deployment.
Empirical validations in applications like robotics and autonomous driving demonstrate significant improvements in safety, efficiency, and reliability over traditional design-then-verify methods.

Correct-by-construction control learning denotes a family of controller synthesis frameworks in which safety and/or temporal logic specifications are enforced during the learning or optimization of control policies, with formal guarantees preserved throughout both training and deployment. In these paradigms, formal verification, reachability analysis, or constraint satisfaction are systematically integrated into the policy update process, so that any synthesized controller is “correct” with respect to the designated specifications by construction, not merely by post-hoc empirical validation. This approach fundamentally contrasts with classical “design-then-verify,” providing provable safety, reach-avoid, or temporal-logic satisfaction properties for complex systems—including those with neural network (NN)-parametrized policies, hybrid or stochastic dynamics, and high-dimensional state spaces (Wang et al., 2021, Yang et al., 2022, Tabbara et al., 1 May 2025, Chen et al., 2017, Dai et al., 2023, Liu et al., 7 Dec 2025, Badings et al., 2023, Badings et al., 2021).

1. Formal Problem Formulations and Foundational Guarantees

Correct-by-construction control learning frameworks formalize the control problem by explicitly encoding system dynamics, safety sets, and (potentially) reachability or logic-based task objectives.

A prototypical reach-avoid property is: Let $\mathcal{S}\subseteq \mathbb{R}^n$ (safe set), $\mathcal{G}\subseteq\mathbb{R}^n$ (goal set), $T\in\mathbb{N}$ (horizon). A trajectory $(x_0,x_1,...,x_T)$ satisfies reach-avoid if:

$\forall t=0,\ldots,\tau-1: x_t\in \mathcal{S}$ ,
$\exists \tau\leq T: x_\tau\in\mathcal{G}$ .

System dynamics are typically of the form $x_{t+1} = f(x_t,u_t)$ or in continuous time $\dot{x} = f(x) + g(x)u$ . The control law (policy) is parametrized by $\theta$ (classically structured or neural).

Correctness-by-construction is realized by incorporating formal verification into policy optimization, so that any parameter $\hat{\theta}$ found by the procedure is guaranteed—in the sense of reachability, invariance, or logic satisfaction—to satisfy the specification on the modelled (and possibly uncertain/stochastic) system (Wang et al., 2021, Badings et al., 2023, Dai et al., 2023, Chen et al., 2017). In the stochastic or partially observed setting, properties are interpreted probabilistically, e.g., as satisfying the specification with probability at least $p_{\mathrm{req}}$ under all system-realization and noise scenarios (Badings et al., 2023, Badings et al., 2021).

2. Verification-Integrated Learning: Design-While-Verify Methods

Contrary to post-hoc verification, design-while-verify strategies embed formal property checks into the iterative process of parameter learning or controller design. Wang et al. (Wang et al., 2021) introduce a closed-loop “verification-in-the-loop” approach where, at each parameter update, forward reachable sets of the closed-loop system are computed with respect to the candidate controller. Performance metrics (e.g., $d^u(\theta)$ for safety margin, $d^g(\theta)$ for proximity to the goal) are constructed from these reachable sets. The learning objective is then: $\max_{\theta} \quad m(\theta) = d^u(\theta) + \lambda d^g(\theta) \qquad \text{subject to } d^u(\theta)\ge 0,\, d^g(\theta)\ge0$ A finite-difference gradient or other derivative-free optimization is used since $m(\theta)$ may be non-differentiable due to the embedding of a verifier. Critically, termination at a parameter vector satisfying the constraints ensures—via the formal reachability/over-approximation logic of the verifier—that the resulting controller certifiably enforces the reach-avoid property for all initial conditions in $X_0$ (Wang et al., 2021).

3. Safety Guarantees via Control Barrier Functions and Differentiable Layers

Control barrier functions (CBFs) are a central formalism for correct-by-construction safety in continuous-time systems. A scalar function $h:\mathbb{R}^n\to \mathbb{R}$ is a CBF for a safe set $C=\{x\mid h(x)\ge 0\}$ if, for all $x\in C$ , there exists $u\in U$ such that

$\dot{h}(x,u) + \alpha(h(x)) \geq 0$

where $\alpha(\cdot)$ is a class- $K$ function. Any policy $\pi(x)$ with $\pi(x)\in K_{cbf}(x)$ for all $x\in C$ ensures forward invariance of $C$ (Yang et al., 2022, Tabbara et al., 1 May 2025, Dai et al., 2023).

Modern correct-by-construction frameworks embed CBF constraints as differentiable layers within neural network policies, either by:

Projection-based safety layers (NN-diff-QP): A quadratic program projects the unconstrained (nominal) control onto the set $K_{cbf}(x)$ , with differentiability guaranteed by the KKT conditions (Yang et al., 2022).
Set-theoretic (gauge) parametrizations (NN-gauge): A neural network generates control parameters in a normalized space, mapped via the Minkowski gauge to $K_{cbf}(x)$ , retaining end-to-end differentiability (Yang et al., 2022).

In both constructions, all outputs throughout the learning process remain within the admissible control set, and safety is maintained during both training and policy deployment. The learning objective can balance nominal control tasks (e.g., trajectory tracking, reward maximization) with strict CBF-constrained safety.

4. Data-Driven and Robust Learning under Uncertainty

Correct-by-construction learning frameworks have been extended to operate under uncertainty in the system dynamics, the safe set boundary, or the data distribution support.

Learning under model uncertainty: Episodic refinement schemes jointly learn both a CBF $h_{\theta}$ and a dynamics model $\dot{x} \approx f_{\psi}(x,u)$ , starting from a conservative, handcrafted initial barrier and a nominal model. Loss functions are structured to enforce CBF constraints on safe samples, penalize unsafe/violation transitions, and maintain classification margins relative to constraint-set boundaries. Theoretical results establish bounds on forward invariance under learned model error, guaranteeing that the refined CBF preserves safety under true dynamics as long as error remains within the enforced margin (Dai et al., 2023).
Offline safe learning from data: Conservative control barrier functions (CCBFs) penalize barrier-value inflation on out-of-distribution (OOD) samples to mitigate unreliable generalization. The learning objective, inspired by Conservative Q-learning, blends in-distribution empirical risk with explicit OOD suppression, ensuring the resulting filter remains correct-by-construction for all encountered states during deployment while minimally interfering with performance (Tabbara et al., 1 May 2025).
Supervisor-based architectures: Machine learning-derived controllers can be integrated beneath a CBF-QP filtering layer—a process in which unsafe outputs are “projected” minimally to safety by solving a small QP at each time step, ensuring real-time enforcement of the correct-by-construction guarantee (Chen et al., 2017).

5. Temporal Logic, High-Order Constraints, and Differentiable Constraint Embedding

Recent advancements generalize correct-by-construction learning to temporal-logic (especially STL) specifications via high-order control barrier functions (HOCBFs):

HOCBFs for temporal logic: A time-varying barrier $b(x,t)$ with relative degree $m$ enforces STL predicates by recursive class- $K$ primitives. Collections of $b_j(x, t)$ encode atomic predicates and their temporal operators, with time-varying offsets and trainable gain parameters derived systematically from the STL parse tree (Liu et al., 7 Dec 2025).
Differentiable Quadratic Programming: “BarrierNet” architectures embed all HOCBF constraints into a differentiable QP solved at each control cycle, permitting backpropagation end-to-end through the QP’s KKT system. This allows reinforcement learning–style optimization over expected robustness of STL satisfaction, while maintaining strict feasibility and safety via the always-active QP constraints (Liu et al., 7 Dec 2025).
Robustness metrics: Unified robustness measures aggregate STL robustness, input constraint slack, and QP feasibility into a single scalar objective for learning. This construction ensures that any controller achieving nonnegative robustness is provably correct for the combined temporal-logic and feasibility specifications.

6. Stochastic, Uncertain, and Partially Observable Systems via Formal Abstractions

Correct-by-construction approaches also target stochastic, uncertain, and partially observable systems via abstraction-based methods:

Interval Markov Decision Processes (iMDPs): A robust abstraction—partitioning the continuous state space into regions—represents the stochastic dynamics as an iMDP with interval transition probabilities, accounting for noise, parameter uncertainties, or measurement imprecision (Badings et al., 2023, Badings et al., 2021). Correct-by-construction controllers are synthesized via robust value iteration against these interval models, optimizing for worst-case probabilities of specification satisfaction.
Policy refinement: The iMDP’s discrete policy is mapped back into a continuous state-feedback controller on the original system. Formal correctness theorems guarantee that the derived closed-loop system upholds the required probability of reachability or invariance against unsafe sets (Badings et al., 2023, Badings et al., 2021).
Handling partial observability: State beliefs maintained by Kalman filtering are abstracted and the control policy is computed to act on the current belief mean, with state-uncertainty buffers ensuring that the formal property is preserved up to a quantifiable lower bound, taking into account process and measurement noise (Badings et al., 2021).

7. Experimental Validation and Empirical Performance

Multiple case studies across diverse nonlinear and linear domains demonstrate the efficacy and computational viability of correct-by-construction control learning architectures.

Adaptive Cruise Control (ACC) and nonlinear oscillators: Verification-in-the-loop approaches converge to formally safe and goal-reaching controllers in less than two orders of magnitude fewer iterations compared to standard RL baselines, with 100% verified safety and goal achievement over all initial states (Wang et al., 2021, Yang et al., 2022).
Articulated truck lateral control: A supervised learner distilled from a CBF-constrained trajectory library, when integrated with an online CBF-QP supervisor, incurs near-zero safety interventions while matching or exceeding the performance of classic LQR+CBF baselines in high-fidelity simulation (Chen et al., 2017).
High-dimensional navigation and Safety Gym tasks: Conservative CBF filters learned entirely offline maintain near-zero empirical violation rates on both in-distribution and out-of-distribution states, matching or closely approximating the performance of non-conservative or non-certified methods (Tabbara et al., 1 May 2025).
STL-based robotic tasks: Feasibility-aware BarrierNet controllers universally achieved STL satisfaction and QP feasibility, outperforming both fixed-parameter and unconstrained neural policy baselines in complex multi-objective environments (Liu et al., 7 Dec 2025).
Stochastic reach-avoid with partial observability: iMDP abstraction methods enable correct-by-construction reach-avoid synthesis for LTI systems of up to six dimensions, significantly outperforming sampling-based planners in safety-critical regimes (Badings et al., 2021, Badings et al., 2023).

In all cases, the key empirical finding is that integrating verifiability and constraint satisfaction directly into the learning loop produces controllers with provable and certifiable safety, reach-avoid, or temporal logic guarantees, a property unattainable through post-hoc verification or unconstrained learning approaches.