GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation

Published 3 Dec 2025 in cs.RO | (2512.03347v1)

Abstract: Imitation Learning (IL) holds great potential for learning repetitive manipulation tasks, such as those in industrial assembly. However, its effectiveness is often limited by insufficient trajectory precision due to compounding errors. In this paper, we introduce Grasped Object Manifold Projection (GOMP), an interactive method that mitigates these errors by constraining a non-rigidly grasped object to a lower-dimensional manifold. GOMP assumes a precise task in which a manipulator holds an object that may shift within the grasp in an observable manner and must be mated with a grounded part. Crucially, all GOMP enhancements are learned from the same expert dataset used to train the base IL policy, and are adjusted with an n-arm bandit-based interactive component. We propose a theoretical basis for GOMP's improvement upon the well-known compounding error bound in IL literature. We demonstrate the framework on four precise assembly tasks using tactile feedback, and note that the approach remains modality-agnostic. Data and videos are available at williamvdb.github.io/GOMPsite.

Abstract PDF Upgrade to Chat

Summary

The paper presents the GOMP framework that integrates manifold projection with diffusion policy to reduce trajectory errors in robotic manipulation.
It employs a dynamic 7-arm bandit approach for optimal dimensionality selection, ensuring adaptability to varying task conditions.
Experimental evaluations demonstrate significant improvements over baseline methods in tasks like nut threading and peg insertion, enhancing precision.

Grasped Object Manifold Projection (GOMP) for Multimodal Imitation Learning

Introduction

The "GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation" introduces a novel interactive framework designed to enhance the precision of imitation learning (IL) in robotic manipulation tasks. The framework addresses the fundamental problem of compounding errors that are common in IL methods by projecting the manipulated object onto a lower-dimensional manifold. This technique ensures that the object remains constrained within a task-relevant space, thereby reducing trajectory errors. The paper proposes implementing GOMP in conjunction with a diffusion policy, supported by tactile and proprioceptive data, to optimize manipulation in tasks such as assembly, where precision and adaptability are critical.

Figure 1: An overview of Grasped Object Manifold Projection as implemented in this paper, integrating vision-based tactile sensors and diffusion policy to constrain object behavior in task space.

Methodology

Task Manifold Projection

The GOMP framework utilizes Principal Geodesic Analysis (PGA) to identify a lower-dimensional manifold representative of the task-specific object trajectories. By projecting object states onto this learned task manifold, GOMP minimizes errors orthogonal to the tangent space of the manifold. This approach allows for precise control of non-rigidly grasped objects, which might otherwise deviate from desired trajectories during task execution.

Interactive Dimensionality Selection

To refine the projection dimensionality, the framework employs a 7-arm bandit approach. This method dynamically selects the optimal number of dimensions for the task manifold based on rollout performance, updating projections in response to observed successes and failures. This interactive component ensures that the learning system remains adaptable to varied task conditions and dataset quality.

Integration with Diffusion Policy

The baseline IL is executed using Diffusion Policy (DP) models, which are state-of-the-art for learning state-action mappings. These models are trained with shear-field and proprioceptive input data. The GOMP framework enhances DP by correcting policy deviations via manifold projection, offering stability and precision especially when tactile feedback introduces additional uncertainties.

Figure 2: Nut threading results indicating the comparative performance between GOMP and baseline DP as demonstrations increase, showing significant improvement with GOMP.

Experimental Evaluation

GOMP's efficacy is validated through experiments involving complex assembly tasks such as nut threading, peg insertion, USB insertion, and battery cover placement. Across these tasks, GOMP consistently outperforms the baseline DP method, especially as the number of demonstrations increases, suggesting greater robustness and precision in trajectory execution. The continuous use of tactile feedback in these experiments highlights GOMP's capability to handle real-world uncertainties more effectively than traditional methods.

(Figures 5 to 8)

Figures 3-7 demonstrate improvements across various tasks with GOMP implementation, highlighting differential performance gains when using task-specific manifold constraints.

Discussion

GOMP addresses the critical challenge of trajectory stability in IL by leveraging geometric constraints via manifold projection. Its dynamic adjustment of task space dimensionality ensures that the system can react effectively to disturbances or environmental changes, reducing the likelihood of irreversible errors seen in baseline policies. This approach could significantly impact industrial automation by providing a more flexible alternative to traditional rigid and costly fixture-based systems.

The observed improvement in task performance suggests that manifold constraints can form an integral part of future IL applications. However, the requirement for accurate in-hand pose estimation, currently primarily achieved through tactile sensing, poses a limitation that may necessitate further research into alternative perception methods.

Conclusion

The GOMP framework advances the field of IL by integrating innovative geometry-based methods to improve task precision without increasing the demonstration dataset size significantly. This technique not only augments the capabilities of existing DP models but also expands the scope of IL applications in complex manipulation tasks. Future developments may focus on enhancing perception accuracy and extending manifold projection techniques to a broader range of manipulative actions, ultimately paving the way for more adaptive and efficient robotic systems in industrial settings.

Markdown