Backward Reachability Curriculum (BaRC)

Updated 11 August 2025

Backward Reachability Curriculum (BaRC) is an approach that computes the set of initial states from which a target can be reached, providing a foundation for enhanced learning and safety verification in dynamic systems.
BaRC improves sample efficiency and safety certification by systematically growing and segmenting the learning process for high-dimensional and complex systems.
It is applied in neural feedback control, reinforcement learning, and distributed verification to enable scalable controller synthesis and robust safety guarantees.

A Backward Reachability Curriculum (BaRC) is an approach for analyzing, synthesizing, or accelerating learning in dynamical systems, control, and reinforcement learning by harnessing the notion of backward reachability—computing or exploiting the set of states from which a target set (e.g., a goal, unsafe, or desirable region) can be reached. BaRC systematically “grows,” “summarizes,” or “segments” the learning or verification process from the target backwards, thereby improving efficiency, sample usage, safety certification, and/or curriculum design in high-dimensional, complex, or infinite systems. BaRC paradigms are seen across safety analysis for neural feedback loops, scalable controller synthesis, curriculum generation in robotic and RL settings, and distributed verification for large-scale systems.

1. Foundations of Backward Reachability

Backward reachability, sometimes termed controllability in control theory, refers to the problem of determining the set of initial states from which it is possible (sometimes under specified control policies and disturbances) to reach a target set within a prescribed time, under the given dynamical system. These backward reachable sets (BRSs) are fundamental both for safety verification (identifying all states inevitably leading to unsafe behaviors) and controller synthesis (finding initial states from which desired objectives are attainable).

Mathematically, for continuous or discrete systems, the BRS at time $t$ for a target set $\mathcal{T}$ is often defined as:

$\mathrm{BRS}_{-t}(\mathcal{T}) = \{ x_0 : \exists \text{admissible input } u(\cdot), \ x(t;x_0, u) \in \mathcal{T} \}$

For safety applications, a minimal BRS includes all states from which, regardless of control, the system must reach the hazard; for controller synthesis, a maximal BRS includes all states from which there exists a control input to drive the system into the goal despite worst-case scenarios (Wetzlinger et al., 2023).

Backward reachability analysis is central to domains such as hybrid systems verification, nonlinear and uncertain control, and reinforcement learning with sparse rewards.

2. Algorithmic Paradigms and Set Representations

Classical methods to compute BRSs rely on gridding (then solving Hamilton–Jacobi–(Isaacs) equations), representation with polytopes, ellipsoids, or zonotopes, and state explosion limits their scalability. Modern approaches focus on set propagation, symbolic methods, and convex relaxations.

Set Propagation Techniques (LTI/Nonlinear): Polynomial complexity algorithms propagate polytopic, zonotopic, or constrained zonotopic sets backward through linear or linearized system dynamics. For example, constrained zonotopes allow efficient representation and computation of affine transformations, intersections, and Minkowski sums/differences (Yang et al., 2022, Wetzlinger et al., 2023).
Approximate and Exact BRS Computation: Conservative under-approximations may be computed in polynomial time (e.g., via a two-step Minkowski difference for constrained zonotopes), while exact representations can require exponential resources (Yang et al., 2022, Zhang et al., 2023).
Hybrid Zonotopes: For neural feedback systems, hybrid zonotopes exploit binary generators to exactly encode the nonlinear and piecewise-affine structure created by ReLU activations, enabling precise BRS computation via closed forms and MILP (Zhang et al., 2023).
Backward Curriculum in RL: Instead of explicit sets, BaRC in RL/robotics generates a curriculum by initializing learning near the goal and expanding backward using approximate BRSs computed from simplified dynamics models, ensuring new start states are dynamically plausible and efficiently support learning (Ivanovic et al., 2018).

The following table contrasts key set representations:

Representation	Advantages	Limitations
Polytopes	Exact for linear systems, amenable to LP/MILP	May explode in number of facets
(Constrained) Zonotopes	Efficient affine ops, scalable	Conservative for general polytopes
Hybrid Zonotopes	Exact for piecewise-affine NNs, closed forms	Binary generator count may grow

3. Backward Reachability Curricula in Reinforcement Learning and Control

The concept of a backward curriculum has been influential in overcoming exploration and sample complexity bottlenecks in RL and robotics:

Reverse Curriculum Generation (RCG): RCG and BaRC methodologies begin training near the goal, producing new start states by diffusing outward (Brownian rollouts or expanding BRSs), focusing updates where learning is most effective and exploiting a moving set of “good starts” (Florensa et al., 2017, Ivanovic et al., 2018).
Reward Signal Amplification: By reversing trajectories (Ko, 2022) or backward expanding demonstration-based curricula (Srinivasan et al., 2019), agents receive denser or more frequent reward signals, thus dramatically improving early-stage learning.
Goal-conditioned RL sans Rewards: Backward model-based trajectory engineering, shortest path filtering, and imitation learning can produce policies that can reach goals purely by leveraging induced backward reachability structures, even without explicit reward signals (Höftmann et al., 2023).

Practical implementations typically iterate between expanding the starting set (either via BRSs or random walks from “good” or goal states), updating the policy through standard RL or imitation learning techniques, and adapting the start-state distribution as performance improves.

4. Safety Certification and Neural Feedback Loops

Backward reachability techniques have become a cornerstone for safety verification in systems controlled by neural network (NN) policies:

Backprojection (BP) Sets: For a neural feedback loop, the BP set comprises the states from which a NN-controlled system could be driven into an unsafe set. Tools such as CROWN produce affine relaxations of the (potentially non-invertible) NN controller, enabling over-approximation of BP sets through LP/MILP (Rober et al., 2022, Rober et al., 2022).
Domain Refinement and Polytope Representation: Iterative refinement of the relaxation domain, by successively shrinking and aligning with polytopic boundaries, can dramatically tighten BP set over-approximations, allowing stronger safety certificates (Everett et al., 2022).
Exact Certification with Hybrid Zonotopes: For linear plants with ReLU-activated NNs, the exact BRS can be encoded as a hybrid zonotope. Safety queries reduce to checking MILP feasibility for intersection between BRS and initial sets (Zhang et al., 2023). This framework is both sound and complete relative to the expressiveness of the hybrid zonotope.

These advances allow certification for high-dimensional nonlinear and “data-driven” controllers, including ground robots and aerial vehicles, extending classical reachability results to learned systems.

5. Distributed and Scalable BRS Computation

Scaling backward reachability analysis to large networked systems requires distributed and structure-exploiting approaches. Distributed BaRC decomposes global reachability into local agent problems:

Set Projection and Extrusion Operators: Each subsystem computes its own local reachable set, and global coupling is managed via set-theoretic projections (onto local sub-coordinates) and extrusions (lifting local sets to global space). The intersection of appropriately extruded local sets forms the global BRS (Liuzza et al., 2021).
Algorithmic Decentralization: Each node solves its local reachability subproblem, sharing minimal boundary information with neighbors. Exactness is theoretically guaranteed if local problems are solved precisely, irrespective of the constraint structure or the numerical solver type.
Solver-agnostic Framework: The distributed BaRC can incorporate any local reachability computation method—PDE-based, sampling, ellipsoidal, or optimization-based. This broad applicability opens scalability to complex power networks, multi-robot systems, and large-scale process networks.

This distributed methodology avoids central bottlenecks and is particularly advantageous for very high-dimensional, coupled nonlinear systems.

6. Applications, Impact, and Future Directions

BaRC strategies have enabled new levels of scalability, efficiency, and safety in various applications:

Control Synthesis: Inner-approximation of BRSs provides provably correct start sets for which a constructed controller, via polynomial SOS or IQC methods, guarantees target reaching while respecting disturbances and actuation limits (Yin et al., 2019, Yin et al., 2020).
Safety-Critical Systems: Practical applications span collision avoidance for ground/aerial robots, multi-agent coordination, and formal verification for embedded and learning-based controllers (Rober et al., 2022, Rober et al., 2022, Everett et al., 2022).
Curriculum Learning in RL: Dramatic improvements in sample efficiency and stability have been shown in simulated and real-world continuous robotic tasks as well as in discrete instruction-following environments (Ivanovic et al., 2018, Srinivasan et al., 2019).
Scalability: Polynomial-time set propagation enables analysis of systems with over 100 dimensions, far exceeding grid-based reachability's tractability (Wetzlinger et al., 2023).
Verification for Neural Feedback: The development of backward reachability techniques tackling neural network controllers exemplifies a significant advance over forward reachability in terms of conservativeness and computational practicality (Zhang et al., 2023).

Potential future research directions include refinement of abstract domains for constraint-based reachability (Gotlieb et al., 2013), integration of feedback control and adaptive methods into set-propagation frameworks (Wetzlinger et al., 2023), extension of certification techniques to stochastic and hybrid systems, and further fusion with data-driven and learning-based controller synthesis.

7. Theoretical and Practical Considerations

Abstraction, Consistency, and Over-Approximation: The core challenge in BaRC is managing the tradeoff between computational tractability and precision: over-approximation may introduce false positives, necessitating sophisticated abstraction and widening techniques (e.g., in polyhedral and interval domains (Gotlieb et al., 2013)), as well as domain refinement (e.g., DRIP (Everett et al., 2022)).
Algorithmic Efficiency: LP/MILP and SOS frameworks enable practical computation in high dimensions, particularly with advances like polynomial-sized constrained zonotope difference approximation (Yang et al., 2022), iterative set-propagation (Wetzlinger et al., 2023), and symbolic LP/MILP formulations for neural feedback systems (Rober et al., 2022, Zhang et al., 2023).
Uniformity and Partial Information: In multi-agent systems with partial observability, backward construction of strategies enforces uniformity constraints by splitting moves into non-conflicting subsets (Busard et al., 2017).
Curriculum Scheduling: Dynamic adjustment of initial condition regions based on agent or policy performance allows adaptive, efficient expansion of the curriculum in BaRC settings for RL and control (Ivanovic et al., 2018).

Collectively, these considerations highlight BaRC’s role as a unifying paradigm, blending reachability, control, optimization, verification, and learning, with broad applicability and rigorous foundations across disciplinary boundaries.