GenDexGrasp: Generalizable Dexterous Grasping
- The paper introduces a unified GenDexGrasp framework that employs object-centric contact maps and generative models to synthesize diverse, stable grasp candidates for novel objects.
- It leverages physics-guided optimization and bilevel schemes to ensure grasp feasibility by enforcing constraints like kinematics, collision, and force closure.
- The approach demonstrates robust generalization across hand embodiments and tasks, achieving high success rates in both simulation and real-world evaluations.
Generalizable Dexterous Grasping (GenDexGrasp) refers to algorithmic frameworks enabling multi-finger robotic hands to generate physically feasible, diverse, and stable grasps for novel objects, scenes, and tasks, often under significant sensory and embodiment variation. The GenDexGrasp paradigm unifies a core set of advances in representation learning, contact modeling, generative modeling, sim-to-real robustness, and cross-task transfer, addressing dexterous grasp synthesis at the intersection of computer vision, geometric reasoning, physics-based optimization, and machine learning. Recent methods emphasize not only shape-level generalization but also semantic, task-level, and cross-embodiment transfer, supporting open-set, language- or vision-guided dexterous grasping on arbitrary objects and in unstructured environments.
1. Contact and Representation Abstractions
A defining property of GenDexGrasp approaches is the use of object-centric, hand-agnostic contact or affordance representations that decouple object geometry from hand embodiment. The seminal GenDexGrasp method introduces a dense contact map Ω(O, H) assigning affinities between each object surface point vo and a hand configuration H, using an "aligned distance" to resolve side ambiguities on thin-shell objects, weighted by surface normal alignment (Li et al., 2022). Alternative approaches factor grasp representations into per-fingertip contact maps and/or infer local surface affordance fields conditioned on high-level intentions (as in AffordDexGrasp and GrainGrasp) (2503.07360, Zhao et al., 15 May 2024). Contact-centric and affordance-based representations enable diverse sampling, support category-agnostic transfer, and facilitate subsequent hand fitting or trajectory optimization.
Some frameworks advance further by integrating semantic part analysis (PartDexTOG, G-DexGrasp), LLMs (AffordDexGrasp), and cross-embodiment abstractions (CrossDex, AnyDexGrasp) (Wu et al., 18 May 2025, Jian et al., 25 Mar 2025, Yuan et al., 3 Oct 2024, Fang et al., 23 Feb 2025). The emphasis shifts from explicit joint configurations to SE(3)-equivariant, contact- and affordance-centric structures, often leveraged as inputs to generative or flow-based models.
| Representation Class | Example Methods & Features |
|---|---|
| Contact Map / Affinity Field | GenDexGrasp (Li et al., 2022), contact values per vo |
| Fine-grained Fingertip Contact | GrainGrasp (Zhao et al., 15 May 2024), per-digit contact maps |
| Affordance Field (task/gen) | AffordDexGrasp (2503.07360), object-language |
| SE(3)-Equivariant Structure | GAGrasp (Zhong et al., 6 Mar 2025), geometric algebra encoding |
| Part/Segment-Level Priors | PartDexTOG, G-DexGrasp |
2. Generative Models and Grasp Synthesis
Modern GenDexGrasp systems employ generative models—primarily conditional variational autoencoders (CVAE) and conditional diffusion models—to efficiently sample high-dimensional hand-object contact configurations. The canonical pipeline leverages a CVAE (as in GenDexGrasp, G-DexGrasp) conditioned on point clouds or part/affordance information to sample plausible, diverse contact maps or coarse grasp candidates (Li et al., 2022, Jian et al., 25 Mar 2025, Wu et al., 18 May 2025). Diffusion models further advance diversity–quality trade-offs, supporting stable denoising on SE(3)-equivariant spaces (GAGrasp, DexGraspNet 2.0, PartDexTOG) (Zhong et al., 6 Mar 2025, Zhang et al., 30 Oct 2024, Wu et al., 18 May 2025).
An essential property is the separation of latent generative sampling from hand-specific optimization—sampling of contact/affordance, followed by hand fitting to object geometry (GenDexGrasp, GrainGrasp, UniDexGrasp) (Li et al., 2022, Zhao et al., 15 May 2024, Xu et al., 2023). Blockwise decomposition (rot+trans/articulation) and geometric constraint layers improve both diversity and feasibility.
Bilevel optimization schemes (as in (Wu et al., 2022)) explicitly enforce physical constraints—collision, reachability, and frictional wrench closure—post generative sampling, yielding physically feasible dexterous grasps even with multimodal generative proposals.
3. Physical Feasibility and Grasp Quality
Guaranteeing the physical feasibility and stability of synthesized grasps is central to the GenDexGrasp paradigm. Rich constraint regimes are imposed, including kinematic reachability (via forward/inverse kinematics and sampled IK), collision avoidance (signed distance or penalty-based), and dynamic force closure (static equilibrium, Coulomb friction polyhedral cone approximation) (Wu et al., 2022, Li et al., 2022, Zhao et al., 15 May 2024). Penetration, contact stability, and articulation constraints are all enforced during the optimization or sampling steps.
Physics-guided sampling with differentiable simulation (GAGrasp) and bilevel optimization (GenDexGrasp, (Wu et al., 2022)) establish robust physical closure, with explicit metrics for constraint violation (force, torque, fingertip-object distance, etc.). Task-oriented objectives (e.g., Sim. displacement, role satisfaction, force-aware PD control) are incorporated in PartDexTOG, G-DexGrasp, and OmniDexGrasp for semantic or intent-grounded grasp behavior (Wu et al., 18 May 2025, Jian et al., 25 Mar 2025, Wei et al., 27 Oct 2025).
Experimental metrics—penetration volume, max depth, drop test displacement, and success rates—support systematic evaluation of grasp feasibility and transferability across objects and settings (Zhang et al., 30 Oct 2024, 2503.07360, Zhao et al., 15 May 2024).
4. Generalization Mechanisms and Embodiment Transfer
GenDexGrasp systems are distinguished by explicit design for generalization across object categories, hand embodiments, and task or semantic variations. Key advances include:
- Hand-Agnostic Pipelines: Models such as GenDexGrasp (Li et al., 2022), AnyDexGrasp (Fang et al., 23 Feb 2025), and CrossDex (Yuan et al., 3 Oct 2024) use intermediate representations (contact maps, eigengrasps, CGRs) enabling transfer between different hand kinematics without retraining.
- Semantic and Task Conditioning: AffordDexGrasp (2503.07360), PartDexTOG (Wu et al., 18 May 2025), G-DexGrasp (Jian et al., 25 Mar 2025), and OmniDexGrasp (Wei et al., 27 Oct 2025) incorporate open-set or language-conditioned goal representations (parts, affordances, grasp intention), which are mapped to grasp fields and used to guide synthesis on novel objects and tasks.
- Cross-Embodiment Policies: CrossDex achieves a single vision-based policy that controls four hand morphologies at 80% success and demonstrates zero-shot generalization to two unseen hands (Yuan et al., 3 Oct 2024).
- Sim-to-Real Transfer: ClutterDexGrasp (Chen et al., 17 Jun 2025) and DexGraspNet 2.0 (Zhang et al., 30 Oct 2024) incorporate curriculum and domain randomization in simulation, robust point cloud/fusion strategies, and safety supervision, yielding zero-shot closed-loop or open-loop transfer to real-robot scenes with clutter at >83% success.
These mechanisms minimize the generalization gap in novel settings (e.g., 3–5 pp out-of-domain drop in GenDexGrasp (Li et al., 2022); zero-shot cross-hand transfer in CrossDex (Yuan et al., 3 Oct 2024); 90.7% real-world success in clutter for DexGraspNet 2.0 (Zhang et al., 30 Oct 2024)).
5. Systematic Evaluation and Benchmarking
Large-scale synthetic and real-robot experiments demonstrate the efficacy of GenDexGrasp paradigms. MultiDex, DexGraspNet 2.0, OakInk-shape, and open-set tabletop suites together support benchmarking on 10⁴–10⁶+ objects and grasp instances (Li et al., 2022, Zhang et al., 30 Oct 2024, Wu et al., 18 May 2025). Key evaluation axes include:
- Grasp success under rigid-body and simulated physical interaction (slip/drop test, force closure);
- Diversity (joint-angle std, number of distinct grasp types);
- Generalization gap (seen vs. unseen instances/categories/hands/tasks);
- Open-set and intent/part consistency (FID, R-Precision, perceptual score).
A summary sample:
| Method | Success (%) | Diversity (rad) | Inference (s) | Open-set FID/Prec. | Comments |
|---|---|---|---|---|---|
| GenDexGrasp (Li et al., 2022) | 77.2 | 0.21 | 16.4 | – | High generalization across hands |
| CrossDex (Yuan et al., 3 Oct 2024) | 80 (train), 35 (zero-shot) | – | – | – | Multi-hand/unified policy |
| DexGraspNet 2.0 (Zhang et al., 30 Oct 2024) | 90.7 | – | <0.5 | – | Real-world, open-loop, clutter |
| AffordDexGrasp (2503.07360) | 45.1 (open-set) | – | – | FID 0.23/Top-1 0.48 | Language/part grounding |
| PartDexTOG (Wu et al., 18 May 2025) | – | 0.993 | – | P-FID 14.24 | Task/part-selected |
6. Limitations and Future Directions
Despite significant progress, GenDexGrasp methods face several open challenges:
- Sensitivity to mesh errors or missing surface normals, especially in physics-refined or contact-based pipelines.
- Real-time constraints: per-grasp iterative optimization (e.g., GrainGrasp, GenDexGrasp) increases inference time, limiting deployment in high-frequency control scenarios (Zhao et al., 15 May 2024, Li et al., 2022).
- Absence of tactile and force sensing in most current vision-only methods, which may reduce reliability in deformable or dynamic manipulations (Fang et al., 23 Feb 2025, Chen et al., 17 Jun 2025).
- Limited capacity for true open-world and functional grasping (multi-task, in-hand manipulation, bimanual action) due to bottlenecks in abstraction, policy capacity, and/or dataset diversity.
- Sim-to-real visual and dynamics gap, especially in highly cluttered, occluded, or transparent/specular object configurations (Zhang et al., 30 Oct 2024, Chen et al., 17 Jun 2025).
Emerging directions include tighter integration of tactile perception, direct learning of flexible policies from vision/affordance fields, end-to-end manipulation pipelines (beyond single grasping), adaptive refinement for closed-loop feedback, and extension to multi-tool, task-driven multi-agent or collaborative grasping (Zhong et al., 6 Mar 2025, Wu et al., 18 May 2025, Wei et al., 27 Oct 2025, Jian et al., 25 Mar 2025).
7. Summary of Major Contributions and Impact
GenDexGrasp frameworks enable a unified, generalizable approach to dexterous grasp synthesis, radically expanding the feasible working domain of multi-finger robotic hands in realistic environments. The core innovations—object-centric or semantic intermediate representations, high-capacity generative models, constraint-driven and physics-certified optimization, and explicit sim-to-real adaptation—have established new benchmarks in grasp diversity, success rate, embodiment transfer, and open-set/semantic adaptation. These advances directly impact both theoretical understanding and real-world deployment of dexterous manipulators and provide a rigorous, reproducible comparative basis for ongoing research in robotic manipulation, contact modeling, and tactile-visual intelligence (Li et al., 2022, 2503.07360, Chen et al., 17 Jun 2025, Wu et al., 18 May 2025, Zhong et al., 6 Mar 2025, Zhang et al., 30 Oct 2024).