Multi-Embodiment Grasping

Updated 11 December 2025

Multi-embodiment grasping is a framework that enables the synthesis, transfer, and execution of grasp strategies across varied robotic end-effectors.
It leverages advanced methods such as morphology graphs, point-cloud encodings, and physics-aware optimizations to map grasp behaviors with minimal adaptation.
This paradigm supports robust sim-to-real transfer, multi-modal grasping, and the development of universal manipulation agents in robotics.

Multi-embodiment grasping encompasses the synthesis, transfer, and execution of robust, efficient grasp behaviors across diverse robotic end-effectors, morphologies, and grasping modalities. The objective is to build generalizable models, representations, and control pipelines such that data, policies, or demonstrations specified for one embodiment (e.g., human hand, soft multimodal gripper, anthropomorphic or non-anthropomorphic robot hand) can be effectively mapped to many other embodiments, with minimal per-system adaptation. This paradigm is foundational for progressing toward universal manipulation agents, robust sim-to-real transfer, and enabling foundation models in robotic grasping.

1. Formal Definitions and Problem Scope

Multi-embodiment grasping generalizes the grasp synthesis problem from single, fixed end-effectors to heterogeneous collections of robotic hands, grippers, or hybrid end-effectors, each possibly exhibiting variable degrees of freedom, actuation modes, and geometries. The core task may be expressed as estimating, for a given object (typically specified by point cloud, mesh, or geometric features) and a given end-effector embodiment (specified by kinematic, geometric, or morphological description), a feasible and stable grasp configuration:

$(\mathcal{O}, \mathcal{E}) \mapsto \text{Grasp}(\mathcal{O}, \mathcal{E}), \text{ where } \mathcal{E} \in \{\text{arbitrary robot hand, gripper, or composite end-effector}\}$

In the context of data-driven methods, multi-embodiment further requires that models trained on data from one, several, or all embodiments generalize to unseen objects and, critically, to unseen end-effectors—with minimal or no retraining or retargeting (Attarian et al., 2023, Wei et al., 25 Dec 2024, Wu et al., 29 Sep 2025, Freiberg et al., 31 Oct 2025).

2. Representations for Embodiment and Object Geometry

Fundamental to generalization across embodiments is explicit encoding of both end-effector morphology and object geometry. Multiple strategies have been developed:

Morphology Graphs and Embeddings: Embodiment is described as a kinematic graph $G_M = (V_M, E_M)$ , with nodes representing links (parameterized by physical dimensions and position) and edges representing joints (revolute/fixed, articulated via URDF or kinematic descriptions) (Wei et al., 25 Dec 2024, Zhang et al., 7 Oct 2025). Node and edge features are embedded via GCNs or Transformers to yield fixed-length morphological representations.
Point-Cloud Representations: Both object surfaces and gripper geometries are encoded as point clouds, optionally augmented with keypoints, normals, or mesh-derived features. These are processed using GCNs, PointNet++, or equivariant architectures (Attarian et al., 2023, Freiberg et al., 24 Oct 2024).
Hybrid and Human-Inspired Contact Maps: Contact probability maps or human-like contact representations generated via CVAEs, as in CEDex (Wu et al., 29 Sep 2025), provide dense, semantics-rich priors for both anthropomorphic and arbitrary hands.
Low-dimensional Articulation Spaces: Eigengrasp bases or linear subspaces derived from PCA on observed joint configurations encode the relevant manifold of feasible hand articulations, facilitating cross-embodiment mapping (Zhang et al., 7 Oct 2025).
Physical Reachability Maps: For analytic approaches, a reachability map encodes the spatial volumes accessible by each surface region of the hand; “opposition spaces” specify valid contact pairs among arbitrarily chosen links (Yao et al., 2023).
Soft/Hybrid Embodiments: Soft grippers and multimodal end-effectors are represented by actuation primitives and parameterized workspace models, as in systems supporting enveloping, suction, and hybrid modes (Liu et al., 2022).

3. Cross-Embodiment Mapping and Optimization

The central challenge is to translate grasp intent, contacts, or joint-space trajectories specified in a source embodiment (such as a human demonstrator or a canonical hand) to the space of target embodiments, accommodating discrepancies in DoFs, geometry, and actuation constraints:

Automatic Embodiment Mapping: For demonstration-based approaches, human hand motion (e.g., MANO representation) is mapped to robot hand poses via joint-space optimization. The mapping minimizes Euclidean distances between virtual markers or contact points, regularized by joint limit and optionally collision constraints; finger-wise IK solvers are typically implemented via SLSQP for real-time performance (Fabisch et al., 2022).
Contact Alignment and Topological Merging: Contact distributions inferred for human hand parts are merged and reassigned to robot-specific parts using geometric heuristics or signed distance field (SDF)–based alignment, minimizing SDF loss between the predicted contacts and the robot surface (Wu et al., 29 Sep 2025).
Physics-aware Grasp Optimization: Given target contacts, pose and joint configuration are refined via composite losses (contact, proximity, penetration, self-collision) using gradient descent under physics simulation, guaranteeing force-closure and avoiding collisions (Wu et al., 29 Sep 2025, Yao et al., 2023).
Learning-based Geometry Matching: Techniques such as GeoMatch and GeoMatch++ use cross-attention between object and morphology embeddings, trained with supervised losses (contact likelihood, regression to ground-truth contacts). Autoregressive contact generation and architecture ablations quantify the contribution of morphology features to out-of-domain generalization (Attarian et al., 2023, Wei et al., 25 Dec 2024).
Equivariant Flow and Diffusion Models: Conditional normalizing flows and score-based diffusion approaches parameterize the distribution over grasps in SE(3) × $\mathbb{R}^{D_g}$ , where $D_g$ is gripper DoF. These models achieve data- and batch-efficient multi-embodiment training and can transfer to unseen embodiments with minimal or no adaptation (Freiberg et al., 31 Oct 2025, Freiberg et al., 24 Oct 2024).
Reinforcement Learning and Demonstration Editing: RL-based pipelines such as DemoGrasp treat grasp refinement as a one-step MDP conditioned on editable demonstration trajectories. A universal policy can be adapted to any new hand by collecting a single demonstration, then optimizing the grasp with a simple binary success/collision reward, enabling strong sim-to-real and cross-hand generalization (Yuan et al., 26 Sep 2025).

4. Generalization, Transfer, and Empirical Benchmarks

Generalist multi-embodiment grasping methods are evaluated by their ability to extrapolate consistent, stable grasps to unseen end-effectors and objects. Benchmarks and metrics include grasp success rate, articulation diversity, and computational efficiency, with results reported in both simulation and real-robot contexts.

Method	Out-of-domain Success (%)	Diversity (rad)	Models / Grippers Used
GeoMatch++	71.67 (mean, 3 grippers)	Up to 0.49	5 (EZGripper, Barrett, etc.)
CEDex	88.7 (mean, 3 grippers)	0.512	4 (Barrett, Allegro, Shadow)
Cross-Embodiment Eigengrasp	91.9 (sim)	N/A	Shadow, Allegro, Barrett
Diffusion	80.9–97.2 (gripper/scene)	N/A	8 types (2–22 DoF)
DemoGrasp	84.6 (mean, 6 grippers)	N/A	Parallel, DClaw, Shadow, etc.
Flow-based Agent	66.1–97.0 (multi-embod.)	N/A	Franka, VX300s, DEX-EE, Allegro, Shadow

Empirical results demonstrate that explicit morphology encoding and/or joint-aligned contact matching are key to transferring grasp policies to substantially different hand morphologies, especially high-DoF and non-anthropomorphic designs (Wei et al., 25 Dec 2024, Wu et al., 29 Sep 2025, Zhang et al., 7 Oct 2025, Freiberg et al., 31 Oct 2025). Including morphology in attention-based architectures increases out-of-distribution success by over 9% compared to point-cloud–only baselines (Wei et al., 25 Dec 2024). Physics-aware constraints and contact-based supervision yield further gains.

Generalization is also observed in few-shot settings; as little as 1,000 adaptation grasps are sufficient to fine-tune large models on a new hand and achieve >85% success on unseen objects (Zhang et al., 7 Oct 2025). Models trained on the CEDex dataset improve external baseline performance, confirming the benefit of large, diverse cross-embodiment data (Wu et al., 29 Sep 2025).

5. Hybrid Grasping and Multimodal Embodiments

Some systems directly integrate multiple grasp modalities (e.g., enveloping, suction) into a single soft or hybrid gripper. Here, the notion of embodiment extends to mode selection and sequencing:

Multimodal End-Effectors: A soft gripper with layered compliant fingers and distributed suction cups supports enveloping, sucking, and hybrid actions (enveloping_then_sucking), leveraging tendon actuation and pressure sensors (Liu et al., 2022).
Hierarchical Action Spaces: Hybrid grasping is formulated as a multistage MDP where the policy must select primitives, sequence actions, and assign parameters to maximize grasping efficiency (objects grasped per action). Double DQN with reward shaping (favoring dual-object actions) leads to empirically demonstrated efficiency >160% compared to single-mode approaches, both in simulation and sim-to-real (Liu et al., 2022).

6. Analytical and Optimization-Based Strategies

Analytical methods exploit the physical kinematics and redundancy of dexterous hands to synthesize human-like, multi-object grasps:

Reachability and Opposition Spaces: By constructing detailed reachability maps for each hand surface region, arbitrary opposition pairs (not just fingertips) can be used for stable multi-object or multi-region grasping, enabling grasps unattainable by fingertip-dominated methods (Yao et al., 2023).
Constrained Optimization for Multi-Object Grasping: Multi-object grasp synthesis is cast as constrained optimization with explicit kinematic, collision, and force-closure constraints, using a kinematic efficiency metric κ to minimize engaged DoFs and allow sequential addition of objects (Yao et al., 2023).

7. Limitations, Open Issues, and Future Directions

Despite considerable progress, several challenges remain:

Failure Modes and Morphology Gaps: Transfer to extremely high-DoF embodiments (e.g., ShadowHand) may require per-hand finetuning, and physically non-anthropomorphic mappings (due to geometry or actuation mismatch) can degrade performance (Wei et al., 25 Dec 2024, Wu et al., 29 Sep 2025).
Dependence on Predefined Keypoints and Mappings: Most frameworks require manual selection of gripper keypoints, explicit part correspondences, or rest-pose alignments. Automating these steps via learned or self-supervised correspondence models remains open (Attarian et al., 2023, Wu et al., 29 Sep 2025).
Simulation-to-Real Transfer: While physics-aware and contact-driven pipelines yield high simulation performance, real-world characteristics—such as friction, compliance, and perception noise—are incompletely modeled (Wei et al., 25 Dec 2024, Freiberg et al., 24 Oct 2024).
Integration of Dynamics and Tactile Feedback: Most methods are open-loop or operate on quasistatic contact models. Closed-loop controllers, force/tactile feedback integration, and dynamic in-hand manipulation, remain largely unexplored (Wei et al., 25 Dec 2024, Zhang et al., 7 Oct 2025).
Data Scaling and Model Efficiency: While large-scale datasets (e.g., 20M grasps, 25K scenes) and JAX-optimized, batched training accelerate progress, scaling to arbitrary new, even adversarial, morphologies may require further architectural innovations (e.g., adaptive graph transformers, foundation models) (Freiberg et al., 31 Oct 2025, Wu et al., 29 Sep 2025).

References

(Fabisch et al., 2022) A Modular Approach to the Embodiment of Hand Motions from Human Demonstrations
(Liu et al., 2022) Hybrid Robotic Grasping with a Soft Multimodal Gripper and a Deep Multistage Learning Scheme
(Yao et al., 2023) Exploiting Kinematic Redundancy for Robotic Grasping of Multiple Objects
(Attarian et al., 2023) Geometry Matching for Multi-Embodiment Grasping
(Freiberg et al., 24 Oct 2024) Diffusion for Multi-Embodiment Grasping
(Wei et al., 25 Dec 2024) GeoMatch++: Morphology Conditioned Geometry Matching for Multi-Embodiment Grasping
(Yuan et al., 26 Sep 2025) DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
(Wu et al., 29 Sep 2025) CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations
(Zhang et al., 7 Oct 2025) Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
(Freiberg et al., 31 Oct 2025) Towards a Multi-Embodied Grasping Agent