Contact Trust Region (CTR) Framework
- The Contact Trust Region (CTR) framework is a suite of algorithmic and geometric methods that explicitly incorporate unilateral and non-smooth contact constraints for advanced optimization and decision-making.
- It combines local Taylor- or mirror-map approximations with specialized trust regions to maintain feasibility, supporting applications in dexterous robotic manipulation, nonlinear elasticity, and constrained reinforcement learning.
- Empirical results show CTR’s ability to achieve low tracking errors and efficient convergence in both online MPC and offline roadmap-based planning, outperforming standard methods in contact-rich scenarios.
The Contact Trust Region (CTR) framework encompasses a suite of algorithmic and geometric tools for solving contact-rich decision and optimization problems. CTR methodologies generalize classical trust region techniques by explicitly incorporating the unilateral and often non-smooth constraints that arise in physical and decision-theoretic contact. Applications span dexterous robotic manipulation, nonlinear elasticity with contact, and safety-constrained reinforcement learning. Distinct variants include the contact-aware trust region for convex trajectory optimization in manipulation (Suh et al., 4 May 2025), filter–trust-region methods for large deformation mechanics (Youett et al., 2017), and constraint-barrier trust region policy updates in reinforcement learning (Milosevic et al., 2024). CTR techniques combine local Taylor- or mirror-map-based modeling with contact-specific feasibility domains, resulting in scalable, globally convergent solutions.
1. Contact Trust Region in Dexterous Manipulation
The Contact Trust Region as developed in "Dexterous Contact-Rich Manipulation via the Contact Trust Region" (Suh et al., 4 May 2025) provides a principled local approximation for contact dynamics in manipulation tasks. The underlying model is quasidynamic, with the configuration governed by an SOCP, capturing both actuated ("robot") and unactuated ("object") coordinates. At each decision step, the system evolves according to:
- , where is the commanded robot position.
- Evolution is subject to velocity cone constraints enforcing nonpenetration and friction.
Rather than employing a standard ellipsoidal trust region, CTR defines the region of trust as the set of for which first-order Taylor approximations of the next state and contact force are guaranteed to satisfy both primal (no penetration) and dual (friction cone) constraints:
- linearized and feasible for all contacts0.
Two variants are distinguished: strict CTR, enforcing both primal and dual, and Relaxed CTR (R-CTR), relaxing the nonpenetration for numerical performance but maintaining dual (force) feasibility in the linearized contact cone.
2. CTR-Based Local Model Predictive Control (MPC) Formulation
CTR is integrated into a local, finite-horizon MPC scheme by solving a convex trajectory optimization problem with dynamics linearized around a nominal trajectory. The optimization:
- Minimizes terminal deviation and control smoothness: 1.
- Subjects the first-order dynamics to the linearized contact model and R-CTR constraints at every time step.
- The resulting subproblem is a sequence of SOCP stages that enforce dual-cone feasibility (or both primal and dual in the strict variant).
Typically, 1–3 sequential convexification steps suffice to converge, thanks to the stability imparted by the contact-aware trust region.
3. Algorithmic Realization of CTR-MPC
CTR-MPC algorithms involve two principal routines:
- Offline trajectory optimization (CtrTrajOpt): Iteratively roll out the nonlinear contact model, solve the convex trust-region subproblem, and update the nominal trajectory.
- Online MPC loop: At each step, warm-start the trajectory and solve for an optimized local control sequence, applying only the first control, then advancing the system and repeating.
An initial-guess heuristic, wherein a virtual torque is applied to establish contact, accelerates convergence when the system starts out-of-contact.
4. Global Planning via Roadmaps
While CTR-MPC provides high-fidelity local planning, it is complemented by a roadmap-based global strategy for contact-rich systems with complex mode transitions:
- Construct an offline graph whose nodes correspond to stable object-robot contact configurations.
- Edges encode feasible local transitions, determined by successful runs of CTR-MPC and collision-free arm repositioning.
- Online, connect the start and goal to nearest roadmap nodes using MPC, then execute the path yielded by shortest-path search.
Roadmap construction for a high-DOF dexterous hand (e.g., AllegroHand, covering all 24 symmetries across 5 grasps and ≈100 edges) can be accomplished in under 10 minutes using CPU resources.
5. Empirical Performance in Manipulation
CTR-based planners have been evaluated both in simulation and hardware for planar (IiwaBimanual) and 3D (AllegroHand) systems:
- Achieved local tracking errors of 2 mm/2.1 mrad (R-CTR) and 3 mm/8.9 mrad (CTR) for IiwaBimanual, 4 mm/5 mrad for AllegroHand.
- R-CTR outperforms both standard ellipsoidal trust region baselines and strict CTR in mean/variance.
- Per MPC iteration wall-clock: 6 ms (R-CTR, IiwaBimanual) to 7 ms (CTR, AllegroHand).
- Roadmap-based global plans are constructed offline in 8 minutes, with online execution at several Hz using only CPU.
- Compared to RL-based approaches requiring thousands of GPU-hours for training, CTR offers comparable hardware dexterity with orders-of-magnitude lower compute (Suh et al., 4 May 2025).
6. Filter–Trust-Region Methods in Hyperelastic Contact
In numerical mechanics, the filter–trust-region framework (Youett et al., 2017) solves large deformation contact problems by augmenting the quadratic model with a filter that balances energy decrease and infeasibility reduction. Contact is imposed via mortar discretization, which ensures numerical stability and avoids spurious oscillations. The trust-region subproblem is a QP with 9 bounds, rapidly solved using Truncated Nonsmooth Newton Multigrid (TNNMG) with Monotone Multigrid correction for nonconvexities. Global convergence is guaranteed by filter-acceptance criteria and feasibility-restoration phases.
7. Constrained Trust Region Policy Optimization in Reinforcement Learning
In constrained reinforcement learning, the C-TRPO algorithm (Milosevic et al., 2024) introduces a trust region whose shape is governed by a composite mirror map including barrier terms for safety constraints:
- The trust region consists only of policies strictly within the safety set, as the Bregman divergence blows up at constraint boundaries.
- At each policy update, a quadratic subproblem is solved with the reward gradient subject to the barrier-augmented trust region.
- The result is monotonic improvement in reward and drastic reduction in cumulative constraint violation compared to prior safe RL algorithms, while preserving computational efficiency.
C-TRPO recovers standard TRPO and Constrained Policy Optimization as special cases, offering safety invariance and global convergence to optimal safe policies.
References:
- Contact-rich manipulation: (Suh et al., 4 May 2025)
- Large deformation contact mechanics: (Youett et al., 2017)
- Constrained RL and information geometry: (Milosevic et al., 2024)