One-Shot Cross-Geometry Skill Transfer through Part Decomposition

Published 16 Apr 2026 in cs.RO | (2604.15455v1)

Abstract: Given a demonstration, a robot should be able to generalize a skill to any object it encounters-but existing approaches to skill transfer often fail to adapt to objects with unfamiliar shapes. Motivated by examples of improved transfer from compositional modeling, we propose a method for improving transfer by decomposing objects into their constituent semantic parts. We leverage data-efficient generative shape models to accurately transfer interaction points from the parts of a demonstration object to a novel object. We autonomously construct an objective to optimize the alignment of those points on skill-relevant object parts. Our method generalizes to a wider range of object geometries than existing work, and achieves successful one-shot transfer for a range of skills and objects from a single demonstration, in both simulated and real environments.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper proposes a novel framework for one-shot skill transfer by decomposing objects into semantic parts to accommodate diverse geometries.
It employs local generative models and relational descriptors to accurately map interaction features, outperforming traditional whole-object methods.
Experimental results in both simulation and real-world tasks demonstrate significant improvements in data efficiency and robustness for robotic manipulation.

One-Shot Cross-Geometry Skill Transfer via Part Decomposition

Introduction

The challenge of skill transfer for robotic manipulation involves generalizing skill policies observed on a source object to target objects that may exhibit significant geometric variability. Previous approaches relying on whole-object representations often suffer from failure modes when presented with shapes outside the training distribution, particularly in few-shot regimes. "One-Shot Cross-Geometry Skill Transfer through Part Decomposition" (2604.15455) addresses this challenge by introducing a framework that decomposes objects into semantic parts. This compositional representation enables data-efficient and highly generalizable cross-geometry skill transfer, leveraging generative modeling of part-level shape variation and the identification of relevant inter-part relationships.

Methodology

Problem Formalization

The framework assumes access to a single demonstration of a target skill, formalized by objects $A$ and $B$ , their initial point clouds, and the transformation $T_{AB}$ that defines the successful execution of the manipulation task. The transfer objective is to infer, without additional demonstrations, a transformation $T_{A' B'}$ for novel object pairs $A'$ , $B'$ , which satisfy the task condition $C(A', B', T_{A'B'}) = 0$ .

Central to the method is the use of interaction features—geometrically grounded points or regions that define salient object affordances for manipulation. The transfer challenge is thus reduced to accurately identifying and mapping these interaction features between the demonstration and the novel objects under substantial intra-class shape variation.

Parts-Based Decomposition and Warping

Rather than process objects monolithically, semantic segmentation (using models such as Segment Anything [kirillov2023segment]) decomposes each object into a set of known parts—e.g., for mugs, "cup" and "handle." For each part, the method trains a local generative shape model via PCA on pose-aligned point clouds, supporting inference of complete part geometry from partial observations and accurate transfer of interaction points using nonrigid registration techniques such as Coherent Point Drift.

A critical refinement involves maintaining not only part-level shape features, but also relational descriptors that encode the canonical spatial relationships among parts (e.g., the alignment of a mug’s handle to its body). These descriptors act as a regularizer during optimization, resolving rotational ambiguities and symmetry-induced local minima, and resulting in more accurate pose and interaction point reconstructions.

Figure 1: Parts-based shape warping yields a significant improvement in skill transfer for manipulation across a diverse range of object geometries in the simulated mug-on-rack setting.

Identification and Transfer of Relevant Part Relationships

Not all part interactions are relevant for a given manipulation task. The approach automatically identifies skill-salient pairs $(m^*, n^*)$ across the two interacting objects by evaluating which subset of possible part relationships between source objects reconstructs the demonstration trajectory most accurately. These pairs are then used to construct the heuristic objective during transfer, optimizing the alignment of only those sub-portions of the objects that matter for task completion.

Figure 2: Part decomposition resolves error induced by changes in inter-part relationships—whole-object keypoint transfer fails, naive part decomposition improves but may still miss the optimal alignment, while recomposing relevant relationships enables correct skill execution.

Figure 3: Overview of the full pipeline for skill transfer via part decomposition, from decomposed perception and feature extraction to relational alignment and final pose transfer.

Relational Descriptor-Conditioned Reconstruction

The method incorporates adjacency-based descriptors and $z$ -axis symmetry-breaking labels, facilitating context-sensitive matching between part instances. Chamfer distances are computed only among point pairs sharing these labels, and regularization penalties are introduced against unrealistic latent space configurations, promoting robust generalization with modest amounts of part-level data.

Figure 4: Example of failure modes—whole object warping and parts-based warping without relational descriptors miss correct alignment, while relational descriptors enable accurate keypoint and shape matching.

Experimental Results

Simulation Benchmarks

The method is benchmarked on two tasks in PyBullet—placing a mug on a rack and placing a bowl on a mug. In both tasks and across a diverse set of geometries, the proposed approach achieves higher success rates than Interaction Warping (IW) [biza23oneshot] and relational Neural Descriptor Fields (R-NDF) [simeonov21ndf], with the most notable differences manifesting in cases with substantial part arrangement variations.

Quantitative Results

Mug on Rack: PSW: $0.78 \pm 0.02$ , IW: $B$ 0, R-NDF: $B$ 1
Bowl on Mug: PSW: $B$ 2, IW: $B$ 3, R-NDF: $B$ 4

The method reliably accommodates drastically different object geometries from categories not present in the training set and with only a single demonstration, indicating significant data efficiency and robustness.

Real-World Robotic manipulation

Robotic validation covers placing mugs on racks, bowls on mugs, and pre-pouring alignment from a teapot into a mug. The system employs multi-view RGB-D input, robust part segmentation, and interactive re-segmentation for error recovery. Even under challenging segmentation conditions and part-level occlusion, parts-based warping generalizes skill transfer with higher accuracy than whole-object methods. Cross-category generalization is demonstrated by successful transfer between teapots and watering cans sharing the same part set.

Figure 5: Qualitative and geometric transfer results on challenging objects; parts-based shape warping mitigates errors due to poor global alignment and irrelevant geometric features.

Theoretical and Practical Implications

This work formally demonstrates that skill transfer via compositional modeling can be posed as an optimization over relevant part relationships, replacing globally invariant cost assumptions with structured, data-efficient local objectives. Practically, this results in a skill policy representation that scales gracefully with object geometric diversity and supports rapid generalization from minimal examples—a significant improvement over monolithic or descriptor field baselines that require much larger datasets and training regimes.

The decomposition offers a modular foundation for future work in multi-part object manipulation, autonomous discovery of shared part sets across categories, and potentially, open-world skill composition via relational policy graphs. While current methods rely on accurate part segmentation (limiting robustness under severe occlusions or segmentation failures), ongoing progress in open-vocabulary part segmentation [wei2023ovparts] may further improve reliability.

Limitations

Primary sources of failure are:

Erroneous part relationship identification from single demonstrations
Segmentation errors leading to incomplete part clouds or ambiguous interaction points
Symmetry-induced biases under sparse observations (not yet fully resolved by relational descriptors)

These limitations highlight fruitful directions for hierarchical relationship inference and more robust, context-sensitive segmentation pipelines.

Conclusion

This research reifies the benefits of compositional object representations for skill transfer in robotic manipulation. By operationalizing part-based decomposition with relational shape warping, the method not only achieves one-shot transfer across highly variable object geometries in both simulation and real environments, but also provides a framework extensible to future challenges in generalizable manipulation and open-world multi-object reasoning (2604.15455).

Markdown Report Issue