Manifold-Constrained HJR Learning Framework

Updated 12 November 2025

The framework is a manifold-constrained Hamilton-Jacobi reachability approach that blends geometric task constraints with deep neural value function approximation.
It uses physics-informed neural networks to approximate the value function on constrained state spaces, ensuring real-time, safe multi-agent planning.
Empirical validations on UR5 manipulation tasks demonstrate high success rates and efficient decentralized trajectory optimization for complex multi-agent systems.

Manifold-constrained Hamilton-Jacobi reachability (HJR) learning frameworks address the problem of ensuring safe multi-agent motion planning (MAMP) under task-induced manifold constraints. Such constraints arise in robotics tasks where agents must not only avoid collisions but also maintain feasible states dictated by task geometry (e.g., carrying objects upright, manipulating objects with dual arms, maintaining end-effector orientation, or crossing doorways with alignment requirements). The HaMMAR framework provides a scalable, decentralized approach that integrates Hamilton-Jacobi reachability analysis on equality-constrained manifolds with deep learning-based value function approximation, enabling real-time multi-agent planning without requiring prior knowledge of other agents’ policies.

1. Mathematical Formulation and Manifold-Constrained HJR

In the unconstrained HJ reachability setting, system dynamics are described by

$\dot x = f(x,u,w),\quad u\in\mathcal U,\, w\in\mathcal W,$

and the value function

$V(t,x)=\max_{u(\cdot)}\min_{w(\cdot)}\left\{\,\ell(x(T))+\int_t^T L(x(\tau),u(\tau))\,d\tau\right\}$

describes worst-case outcomes in reach-avoid differential games. The resulting Isaacs-type PDE takes the form

$\frac{\partial V}{\partial t}(t,x) + \max_{u\in\mathcal U}\min_{w\in\mathcal W}\nabla_x V(t,x) \cdot f(x,u,w) = 0,$

with $V(T,x)=\ell(x)$ .

When the state must remain on a manifold $\mathcal M = \{x \in \mathbb{R}^{n_d} \mid C(x)=0\}$ for some $C:\mathbb{R}^{n_d}\to\mathbb{R}^{n_c}$ , constraints are incorporated via tangent-bundle dynamics. The tangent space at $x \in \mathcal M$ is

$T_x\mathcal M = \{v \in \mathbb{R}^{n_d} \mid J_C(x) v = 0\}, \quad J_C(x) = \frac{\partial C}{\partial x}(x).$

The constrained HJR PDE on the manifold, as formalized in equation (3) of the cited work, is

$\frac{\partial V}{\partial t}(t,x) + \min_{u\in\mathcal U}\big\{\langle \nabla_{\mathcal M} V(t,x), f(x,u)\rangle + \ell_{\!cost}(x,u)\big\} = 0,$

subject to $J_C(x)f(x,u) = 0; \; x \in \mathcal M;\; V(T,x) = \ell(x)$ . The Hamiltonian is equivalently

$H_{\mathcal M}(t,x,p) = \min_{u \in \mathcal U} \left\{ p \cdot f(x,u) + \ell_{\!cost}(x,u) \right\} \quad \text{s.t. } J_C(x) f(x,u) = 0,$

where $p = \nabla_{\mathcal M} V$ . For velocity-controlled systems with $\dot x = u,\, \|u\| \le \bar u,\, C(x)=0$ , the projector onto $T_x \mathcal M$ is

$P(x) = I - J_C(x)^T (J_C(x)J_C(x)^T)^{-1} J_C(x),$

leading to closed-form expressions: $H_{\mathcal M}(t,x,\nabla V) = -\bar u\, \big\|P(x)\nabla V\big\|, \quad u^*(t,x) = -\bar u\, \frac{P(x)\nabla V(t,x)}{\|P(x)\nabla V(t,x)\|}.$

2. Value Function Approximation via Manifold PINNs

The value function $V(t,x)$ is approximated by a feed-forward multilayer perceptron (MLP) denoted $V_\theta(t,x)$ . This neural approximation enables high-dimensional scalability beyond classical grid-based solvers.

Loss is defined following DeepReach principles, combining a terminal loss

$L_1(t,x;\theta) = |V_\theta(T,x) - \ell(x)|\, \mathbf{1}\{t=T\}$

and a PDE-residual loss

$L_2(t,x;\theta) = \left|\frac{\partial V_\theta}{\partial t}(t,x) + \min\{0, H_{\mathcal M}(t,x,\nabla_x V_\theta)\}\right|.$

The total minibatch loss is

$L(\theta) = \mathbb{E}_{(t,x)\sim\mathcal{D}} \big[ L_1(t,x;\theta) + \lambda\,L_2(t,x;\theta)\big],$

where samples are drawn uniformly in time and on-manifold in state space (e.g., via rejection sampling and Newton projection).

The MLP architecture consists of four hidden layers with 128 units per layer, tanh activations, yielding approximately 70k parameters. Training is performed using Adam at $1\text{e}{-4}$ learning rate, batch size 1024, for about 2000 epochs (∼6 h training time on an RTX 4070 GPU).

3. Decentralized Trajectory Optimization with Safety Level-Set Certification

Once $V_\theta$ is trained, the zero sublevel set $\{x^1 \mid V_\theta(\tau, x^{1}(t), x^{i}(t)) \leq 0 \}$ defines the backward-reachable safe tube. Decentralized planning imposes $V_\theta(\tau, x^{1}(t), x^{i}(t)) > \varepsilon$ as a pairwise safety margin against each agent $i$ .

At each receding-horizon step, the control trajectory $u(\cdot)$ is optimized over the planning horizon $t_\mathrm{plan}$ to minimize stage cost, while enforcing the following constraints: discrete-time dynamics, terminal task-manifold constraint $C(x^{1}(t_\mathrm{plan}))=0$ , and safety margins via the neural HJR value $V_\theta>\varepsilon$ . The worst-case disturbance for agent $i$ is predicted as

$d^i(t) = \arg\min_{d \in \mathcal D} \max_{u \in \mathcal U} \nabla_x V_\theta(\cdots) \cdot f(x,u,d).$

The resulting nonlinear program is solved in real time (0.1–0.2 s per agent) using IPOPT. In cases of infeasibility, the system falls back on the closed-form optimal control from the value function.

4. Theoretical Properties and Empirical Validation

Under standard smoothness assumptions (Lipschitz continuity of $f, C, \ell$ ; full-row-rank $J_C$ ), the manifold HJR PDE admits a unique continuous viscosity solution. The backward-reachable set is then exactly $\{x \mid V(t,x)\leq 0\}$ , and neural PINN-based learning yields uniform closeness to the true value as PDE-residuals decrease.

HaMMAR was empirically validated on several benchmarks:

2D Circle-Constrained Particle: Analytical backward-reachable set (BRS), with constrained HJR achieving ≈99.5% classification accuracy, $F_1$ ≈ 99.7%. Unconstrained HJR over-approximates the BRS ( $F_1$ 85–95%).
UR5 Dual-Arm Object-Carrying (Fixed Orientation): HaMMAR success rate (SR) 85%, collision rate (CR) 0%, mean time 0.16 ± 0.15 s; No-HJR SR 68%, CR 32%; AtlasRRT(Centralized) SR 75%, CR 0%, mean time 7.6 s.
UR5 Cup-Holding (Upright Constraint): HaMMAR SR 82%, CR 2%, time 0.10 s; AtlasRRT(Central) SR 60%, CR 1%, time 15 s.
UR5 Doorway-Crossing (Alignment Constraints, $n_c=3$ ): HaMMAR SR 71%, CR 1%, time 0.14 s. No-HJR and AtlasRRT(Decentralized) SR 8–9%, CR 91–92%.
Scaling to High-Dimensional Multi-Agent Settings: 5 UR5s, HaMMAR SR 57%, CR 4%, time 0.17 s; AtlasRRT(Centralized) failed within 20 s.

For each experiment, results were averaged over 100 randomized trials, with performance metrics including success rate, collision rate, planning time, and path length.

5. Implementation Details, Scalability, and Applications

Key architectural and computational characteristics are summarized as follows:

Component	Description	Notes
Value Function Net	4×128 MLP, tanh, ~70 k parameters	MLP for $V_\theta$
Training	Adam, 2000 epochs, learning rate $1\!\times10^{-4}$ , 6 h (RTX4070)	Handles 2D–30D problems
Online Planning	IPOPT, 0.1–0.2 s per agent per step	Real-time feasibility
Multi-Agent Scale	Up to $5+$ UR5 arms, 30+ dimensional states	On-manifold constraints

HaMMAR generalizes across a wide range of high-DOF, multi-agent manipulation tasks, including object-carrying, cup-holding, and doorway navigation, imposing both geometric and task-induced manifold constraints. Neural approximation bypasses the curse of dimensionality traditionally associated with grid-based PDE solvers. This framework accommodates a range of multi-agent manipulation deployments, as the decentralized formulation does not require knowledge or prediction of other agents' detailed control inputs.

6. Conceptual Significance and Extensions

HaMMAR extends classical HJ reachability PDEs to settings with equality-constrained manifolds by rigorously incorporating tangent-bundle geometry and projector-enforced Hamiltonians. The decoupled, PINN-based solution approach enables task-feasible, safety-critical control synthesis in decentralized contexts. The embedding of learned, manifold-aware value functions as certifying constraints into trajectory optimization differentiates this framework from prior sample-based or unconstrained planners.

A plausible implication is improved feasibility and policy generalization for collaborative and adversarial robotics tasks involving complex operational constraints. The theoretical foundation ensures consistency and guarantees on backward-reachable sets, while extensive empirical results demonstrate robust performance, scalability, and computational tractability on realistic, high-dimensional robots and manipulation objectives.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Manifold-Constrained Hamilton-Jacobi Reachability (HJR) Learning Framework.