Manifold-Constrained HJR Learning Framework
- The framework is a manifold-constrained Hamilton-Jacobi reachability approach that blends geometric task constraints with deep neural value function approximation.
- It uses physics-informed neural networks to approximate the value function on constrained state spaces, ensuring real-time, safe multi-agent planning.
- Empirical validations on UR5 manipulation tasks demonstrate high success rates and efficient decentralized trajectory optimization for complex multi-agent systems.
Manifold-constrained Hamilton-Jacobi reachability (HJR) learning frameworks address the problem of ensuring safe multi-agent motion planning (MAMP) under task-induced manifold constraints. Such constraints arise in robotics tasks where agents must not only avoid collisions but also maintain feasible states dictated by task geometry (e.g., carrying objects upright, manipulating objects with dual arms, maintaining end-effector orientation, or crossing doorways with alignment requirements). The HaMMAR framework provides a scalable, decentralized approach that integrates Hamilton-Jacobi reachability analysis on equality-constrained manifolds with deep learning-based value function approximation, enabling real-time multi-agent planning without requiring prior knowledge of other agents’ policies.
1. Mathematical Formulation and Manifold-Constrained HJR
In the unconstrained HJ reachability setting, system dynamics are described by
and the value function
describes worst-case outcomes in reach-avoid differential games. The resulting Isaacs-type PDE takes the form
with .
When the state must remain on a manifold for some , constraints are incorporated via tangent-bundle dynamics. The tangent space at is
The constrained HJR PDE on the manifold, as formalized in equation (3) of the cited work, is
subject to . The Hamiltonian is equivalently
where . For velocity-controlled systems with , the projector onto is
leading to closed-form expressions:
2. Value Function Approximation via Manifold PINNs
The value function is approximated by a feed-forward multilayer perceptron (MLP) denoted . This neural approximation enables high-dimensional scalability beyond classical grid-based solvers.
Loss is defined following DeepReach principles, combining a terminal loss
and a PDE-residual loss
The total minibatch loss is
where samples are drawn uniformly in time and on-manifold in state space (e.g., via rejection sampling and Newton projection).
The MLP architecture consists of four hidden layers with 128 units per layer, tanh activations, yielding approximately 70k parameters. Training is performed using Adam at learning rate, batch size 1024, for about 2000 epochs (∼6 h training time on an RTX 4070 GPU).
3. Decentralized Trajectory Optimization with Safety Level-Set Certification
Once is trained, the zero sublevel set defines the backward-reachable safe tube. Decentralized planning imposes as a pairwise safety margin against each agent .
At each receding-horizon step, the control trajectory is optimized over the planning horizon to minimize stage cost, while enforcing the following constraints: discrete-time dynamics, terminal task-manifold constraint , and safety margins via the neural HJR value . The worst-case disturbance for agent is predicted as
The resulting nonlinear program is solved in real time (0.1–0.2 s per agent) using IPOPT. In cases of infeasibility, the system falls back on the closed-form optimal control from the value function.
4. Theoretical Properties and Empirical Validation
Under standard smoothness assumptions (Lipschitz continuity of ; full-row-rank ), the manifold HJR PDE admits a unique continuous viscosity solution. The backward-reachable set is then exactly , and neural PINN-based learning yields uniform closeness to the true value as PDE-residuals decrease.
HaMMAR was empirically validated on several benchmarks:
- 2D Circle-Constrained Particle: Analytical backward-reachable set (BRS), with constrained HJR achieving ≈99.5% classification accuracy, ≈ 99.7%. Unconstrained HJR over-approximates the BRS ( 85–95%).
- UR5 Dual-Arm Object-Carrying (Fixed Orientation): HaMMAR success rate (SR) 85%, collision rate (CR) 0%, mean time 0.16 ± 0.15 s; No-HJR SR 68%, CR 32%; AtlasRRT(Centralized) SR 75%, CR 0%, mean time 7.6 s.
- UR5 Cup-Holding (Upright Constraint): HaMMAR SR 82%, CR 2%, time 0.10 s; AtlasRRT(Central) SR 60%, CR 1%, time 15 s.
- UR5 Doorway-Crossing (Alignment Constraints, ): HaMMAR SR 71%, CR 1%, time 0.14 s. No-HJR and AtlasRRT(Decentralized) SR 8–9%, CR 91–92%.
- Scaling to High-Dimensional Multi-Agent Settings: 5 UR5s, HaMMAR SR 57%, CR 4%, time 0.17 s; AtlasRRT(Centralized) failed within 20 s.
For each experiment, results were averaged over 100 randomized trials, with performance metrics including success rate, collision rate, planning time, and path length.
5. Implementation Details, Scalability, and Applications
Key architectural and computational characteristics are summarized as follows:
| Component | Description | Notes |
|---|---|---|
| Value Function Net | 4×128 MLP, tanh, ~70 k parameters | MLP for |
| Training | Adam, 2000 epochs, learning rate , 6 h (RTX4070) | Handles 2D–30D problems |
| Online Planning | IPOPT, 0.1–0.2 s per agent per step | Real-time feasibility |
| Multi-Agent Scale | Up to $5+$ UR5 arms, 30+ dimensional states | On-manifold constraints |
HaMMAR generalizes across a wide range of high-DOF, multi-agent manipulation tasks, including object-carrying, cup-holding, and doorway navigation, imposing both geometric and task-induced manifold constraints. Neural approximation bypasses the curse of dimensionality traditionally associated with grid-based PDE solvers. This framework accommodates a range of multi-agent manipulation deployments, as the decentralized formulation does not require knowledge or prediction of other agents' detailed control inputs.
6. Conceptual Significance and Extensions
HaMMAR extends classical HJ reachability PDEs to settings with equality-constrained manifolds by rigorously incorporating tangent-bundle geometry and projector-enforced Hamiltonians. The decoupled, PINN-based solution approach enables task-feasible, safety-critical control synthesis in decentralized contexts. The embedding of learned, manifold-aware value functions as certifying constraints into trajectory optimization differentiates this framework from prior sample-based or unconstrained planners.
A plausible implication is improved feasibility and policy generalization for collaborative and adversarial robotics tasks involving complex operational constraints. The theoretical foundation ensures consistency and guarantees on backward-reachable sets, while extensive empirical results demonstrate robust performance, scalability, and computational tractability on realistic, high-dimensional robots and manipulation objectives.