Normalized Grasp Space (NGS) Framework

Updated 6 January 2026

Normalized Grasp Space (NGS) is a standardized, size-independent framework that maps raw kinematic measurements into a dimensionless unit space for direct comparison of robotic grasp capabilities.
It integrates functional workspace sampling with deep learning-based 6-DoF grasp detection by normalizing gripper patches to achieve geometric invariance and efficient transfer across designs.
NGS improves benchmarking consistency and enhances performance metrics in robotic grasp detection, evidenced by notable AP gains and real-time operation with compact models.

A Normalized Grasp Space (NGS) is a standardized, size-independent, and alignment-consistent framework for representing and benchmarking the grasping capabilities of robotic hands and grippers. The NGS concept connects functional workspace measurements, kinematic envelope characterizations, and modern deep-learning-based grasp detection, providing a unified metric and geometric substrate for cross-device comparison, learning, and efficient inference in robotic grasp planning. As developed in both optimization-based benchmarking (Morrow et al., 2021) and region-aware 6-DoF deep grasp detection (Chen et al., 2024), NGS underpins data consistency, benchmarking, backbone invariance, and generalization.

1. Formal Definition and Rationale

In the context of robotic hands and grippers, NGS provides a mapping from raw kinematic and spatial measurements to a dimensionless unit space, enabling direct quantitative comparison of the grasp workspaces across dissimilar hand designs and gripper scales. For articulated hands, NGS is constructed by specifying a functional grasp coordinate frame “about to grasp” a canonical object (typically a cylinder or sphere), and recording key measurements over representative hand configurations. This space is then normalized along “span” (gripper opening) and “depth” axes to yield a human-interpretable boundary in the unit square $[0,1]^2$ (Morrow et al., 2021).

For deep learning-based 6-DoF grasp detection, NGS is redefined as a local, grasp-centric 3D region around each candidate grasp, aligned and scaled relative to gripper opening width. Each patch is depth-adaptively extracted from sensory data or point clouds and normalized by the reference gripper width, ensuring geometric and scale invariance for all further learning and inference (Chen et al., 2024). This normalization is pivotal for efficient transfer across objects, gripper sizes, and clutter levels.

2. Construction of NGS for Robotic Hands

The measurement and normalization protocol for classical hand workspace NGS consists of four steps (Morrow et al., 2021):

Functional Grasp Coordinate System: Define span $(\hat s)$ , depth $(\hat d)$ , and width $(\hat w)$ axes with reference to the grasp pose on a canonical object. Span runs laterally through the grasp, depth is orthogonal to the palm plane, and width is vertical (table-normal).
Intrinsic Measurements: Determine hand-specific invariants:
- $S_{\max}$ : Absolute maximum span (fully open).
- $W_{\min}$ : Minimum graspable height (stable two-finger opposition).
- $W_{\max}$ : Maximum achievable height (hand open, resting).
Functional Grasp Workspace Sampling: For both precision and power grasps, measure span $(s_i)$ and depth $(d_i)$ at three hand configurations: fully open, fully closed, and one intermediate. For spherical grasps, also measure width.
Normalization: Compute

$\sigma = \frac{s-s_{\min}}{s_{\max}-s_{\min}}, \quad \delta = \frac{d}{d_{\max}}, \quad (\sigma,\delta)\in[0,1]^2$

The NGS $N_G$ is the set of such points under the interpolating grasp boundary curve.

Hand workspaces—precision and power—are plotted in $[0,1]^2$ , allowing direct area-under-curve style comparison of depth maintenance and envelope across hand types. Example metrics for seven hands are given below:

Hand	$s_{\min}$ (mm)	$s_{\max}$ (mm)	$d_{\max}$ (mm)
Model O (Cylindrical)	10	70	50
Model O (Spherical)	20	65	55
Barrett (3-finger)	15	85	60
T-42	12	80	52
Robotiq 2F-85	8	85	58
Kinova Jaco 2 (3f)	10	75	53
Human Right Hand	15	90	65

Each $N_G$ curve exposes relative workspace shape, pinch-vs-envelope tradeoff, and possible limitations of specific kinematics.

3. NGS in Deep Grasp Detection and Region-Aware Networks

NGS for scene grasp detection is reformulated as a local, normalized, grasp-centric 3D patch:

For each candidate grasp center $\mathbf{p}_i = (x_i, y_i, z_i)$ and gripper width $w_{\text{gripper}}$ , a region of fixed real-world size (typically $2w_{\text{gripper}}$ per side) is resampled from the RGBD observation or point cloud, after accounting for foreshortening and camera intrinsics. This yields a patch $\mathbf{P}_i^{\text{raw}}$ .
XYZ locations are referenced to the patch center and normalized by $w_{\text{gripper}}$ such that all 3D points fall within a $[-1, 1]^3$ cube. Grasp labels $\mathbf{g}_j = (\mathbf{t}_j, \mathbf{R}_j, w_j)$ are mapped into this normalized frame as $\mathbf{g}_j^* = ((\mathbf{t}_j - \mathbf{p}_i)/w_{\text{gripper}}, \mathbf{R}_j, w_j/w_{\text{gripper}})$ (Chen et al., 2024).
The resulting representation, $(x^*,y^*,z^*,\theta,\beta,\gamma,w^*)$ , is invariant to translation within the patch, equivariant to in-plane rotation, and scale-invariant.

In RNGNet, these normalized patches feed a coordinate-gated 2D-CNN for 6-DoF grasp prediction. Geometric properties of NGS decouple local grasp feasibility from global scene peculiarities, promoting generalizability and rendering heavy 3D backbones unnecessary.

4. Performance Impact and Generalization

NGS has a measurable and significant effect in both benchmarking and deployment scenarios. In benchmarking ( $N_G$ curve comparison), hand capability can be visualized and quantified across radically dissimilar designs. For learning-based detection, ablation studies on grasp detection find:

Without normalization: 66.71/51.66/26.97 AP (Seen/Similar/Novel splits)
With patch normalization: 69.93/53.15/28.71
Adding depth-adaptive: 73.98/64.89/30.99
With full scale randomization: 75.20/66.62/32.38

Furthermore, RNGNet using NGS achieves 58.06 AP (RealSense) on GraspNet and runs at ≈ 50 FPS with only 3.7 M parameters (Chen et al., 2024). Qualitatively, NGS yields denser and more robust grasp proposals, higher physical success rates in cluttered real-world scenes, and direct transfer across objects and gripper sizes.

5. Algorithmic Workflow and Loss Functions

In NGS-centric learning pipelines, the forward pass comprises:

Extraction of depth-adaptive, size-normalized RGBD or point cloud patches centered on candidate grasp points.
Mapping all points and grasp parameters into the normalized local frame.
Prediction via a regional multi-grasp generator, attaching discrete rotation anchors and performing classification/regression over locations, angles, and widths.
Inverse mapping of predicted normalized grasps back to camera or world coordinates.

Supervision is restricted to grasps within a normalized radius ( $\|\mathbf{t}^*\| < 0.2$ ) of the patch center. The loss combines focal components for angle bins and smoothed-L1 terms for residual angles, translations, and width: $L = L_{\theta_{\mathrm{cls}}} + a L_{\theta_{\mathrm{reg}}} + b L_{\gamma\beta} + c L_{t} + d L_{w}$

6. Relationship to Other Normalized Grasp Representations

NGS is structurally distinct from other normalized or unified grasp representations, such as the Unified Gripper Coordinate Space (UGCS) (Khargonkar et al., 2024). While UGCS maps palm and finger points of a gripper to points on the unit sphere via spherical coordinates to create point-to-point correspondences and transfer grasps across arbitrary grippers, NGS (in both classical and learning-based forms) emphasizes normalization of gripper workspace axes ( $\hat{s}$ , $\hat{d}$ ) or grasp-centric patches for local learning and benchmarking. While both frameworks seek gripper-agnostic invariance for improved transfer and comparison, UGCS is optimized for synthesis and exact point-wise transfer, whereas NGS provides both benchmarking envelopes and a practical substrate for local, patch-based learning.

7. Assumptions, Limitations, and Scope

Key assumptions underlying NGS include:

Functional, not exhaustive: Only a finite set of poses are sampled for workspace characterization, interpolated with low-order fits. Non-convexities and underactuated workspace regions may be underexplored (Morrow et al., 2021).
Canonical object alignment: NGS relies on “as-if-grasping” judgments to orient and place the hand with respect to test objects; reproducibility may require supplementary imagery.
Contact normal constraints: Prescribed orientations (e.g., 30° outward for precision, ≥80° for power), but no explicit friction or torque modeling.
Shape abstraction: NGS is best matched to objects approximating canonical primitives (cylinders, spheres); irregular items may require further refinement.
Width axis decoupling: For 2D $N_G$ , the width dimension is held constant or omitted except in fully 3D grasp analyses.

In deep learning applications, NGS normalization may be less effective if the real-world patch deviates strongly from geometric assumptions, e.g., when scene clutter or object geometry are extreme. Nonetheless, empirical evaluations indicate robust performance improvements and real-time operation with compact models (Chen et al., 2024).

NGS, in both its workspace-benchmarking and scene-detection manifestations, has become a central tool for the standardized measurement, comparison, and training of robotic grasping across heterogeneous mechanical designs, object populations, and observational regimes (Morrow et al., 2021, Chen et al., 2024).