AVO Gradient Layer in Contact-Rich Manipulation
- AVO Gradient Layer is a differentiable, learned value function module integrated into trajectory optimizers that improves performance in tasks requiring contact mode switching.
- It uses an ensemble of multilayer perceptrons to compute value estimates and gradients at each planning step, guiding the optimizer toward lower predicted cost states.
- Empirical results show reduced median quaternion errors and drop rates, with a nearly 50% reduction in planning time during contact-rich manipulation tasks.
An AVO Gradient Layer refers to a differentiable, learned value function module integrated into trajectory optimizers to accelerate and improve performance in tasks requiring contact mode switching, such as dexterous multi-finger manipulation. In the context of Amortized Value Optimization (AVO), this layer predicts value-to-go at every planning step, enabling classical trajectory optimization algorithms to efficiently utilize future expected cost information, especially in the presence of discontinuities associated with contact events (Hung et al., 8 Oct 2025).
1. Mathematical Formulation of the Amortized Value Term
AVO augments classical trajectory optimization, which typically decomposes tasks into sub-tasks for independent optimization, by incorporating a learned value function into the objective across a finite planning horizon. For a system state at timestep , including all relevant physical and auxiliary variables (e.g., hand joint angles, object pose, and task-specific parameters ), the per-segment objective is:
where:
- : running cost,
- : learned value function for contact mode and ensemble member ,
- : ensemble size,
- , : scalar weights for value mean and variance penalty.
This formulation generalizes classical “value at final state” approaches by attaching learned value terms at every stage, enabling cost shaping throughout the trajectory, not just at terminal states. In the limiting case, setting for and applying only at recovers classical forms.
2. Architecture and Ensemble Training of the Value Layer
The AVO Gradient Layer is structured as an ensemble of multilayer perceptrons (MLPs), one per value function for each contact mode and ensemble member. Each network is a 2-layer MLP with:
- Input: (state, discrete phase index, task parameters),
- Hidden Layer: 32 units (turning) or 24 units (regrasp), ReLU activation,
- Output: scalar cost-to-go estimate.
Regularization uses weight decay on all layers, and the variance term in the objective encourages the ensemble to avoid overconfidence outside the training distribution. Training is supervised via MSE loss on label (final cost clipped), utilizing the Adam optimizer and minibatches of size 256.
3. Value Gradient Computation and Integration
Each value function is parameterized as:
with as the pointwise nonlinearity (ReLU). The gradient with respect to input state is computed via standard backpropagation:
This gradient is computed efficiently with autodiff frameworks during each trajectory optimization solver iteration. For each stage , AVO performs a forward pass to obtain , computes mean and variance , and then executes a backward pass to obtain , for optimization updates.
4. Solver Integration and Update Rule
The AVO value layer directly alters the update rule in gradient-based trajectory optimization algorithms. The instantaneous cost at time is:
The update for control variable under gradient descent is:
By the chain rule, the relevant gradients propagate from the value function through the system dynamics mapping :
This mechanism provides a cost-shaping term at every stage, efficiently incorporating predictive information about downstream sub-task success.
5. Empirical Performance and Acceleration
The empirical evaluation of the AVO Gradient Layer on a screwdriver regrasping and turning task demonstrated the following results (Hung et al., 8 Oct 2025):
| Setting | Median Quaternion Error | Drop Rate |
|---|---|---|
| T.O. high-bgt | ≃21.1° | 4% |
| AVO high-bgt | ≃12.1° (−43%) | 4% |
| T.O. low-bgt | ≃22.4° | 8% |
| AVO low-bgt | ≃17.3° (−23%) | 4% (−50%) |
| T.O. hardware | ≃16.4° | 30% |
| AVO hw L-bgt | ≃14.7° (−10.7%) | 10% |
| AVO hw H-bgt | ≃12.7° (−22.3%) | 0% |
Here, “T.O.” denotes trajectory optimization without value shaping, “AVO” refers to the full method. Improvements in median error and drop rate are attributed to AVO’s guidance of the optimizer toward low-predicted-cost and in-distribution states at every timestep. Notably, AVO achieves these gains with a reduction of roughly 50% in wall-clock per-action planning time due to accelerated convergence.
6. Context and Significance in Contact-Rich Manipulation
In multi-finger manipulation, contact mode switching (e.g., between rolling, sticking, sliding, and non-contact) introduces nonconvexities and sensitivity to initial conditions in trajectory optimization. Decomposition of manipulation tasks into separate mode-specific sub-tasks can lead to local minima and excessive computational cost, as it ignores future consequences of current decisions. The AVO Gradient Layer bridges this gap by exposing value-to-go at every step, thereby informing the optimizer of downstream task structure and enabling global improvements in solution quality and computational efficiency.
A plausible implication is that such value-shaped optimization frameworks could generalize to other domains where sequential decision problems with discontinuous cost surfaces arise, especially where classical methods stall due to myopic sub-task decomposition (Hung et al., 8 Oct 2025).