Direct Registration of Raw Points

Updated 21 September 2025

Direct registration of raw points is a method to align point clouds without precomputed correspondences by integrating feature extraction with direct pose estimation.
It leverages learning-based frameworks, graph signal processing, and global geometric optimization to handle noise, occlusion, and partial overlap.
Applications in robotics, medical imaging, and AR demonstrate its superior speed, scalability, and accuracy over traditional ICP methods.

Direct registration of raw points refers to methodologies for aligning two or more point clouds without relying on explicit precomputed correspondences or multi-stage pipelines. These approaches integrate descriptor extraction, pose estimation, and verification in an end-to-end or otherwise direct fashion, often leveraging learning-based frameworks, equivariant neural architectures, optimization over global geometric criteria, or correspondence-free voting methods. This area encompasses both rigid and nonrigid registration, and is central to applications in robotics, medical imaging, autonomous navigation, and 3D reconstruction. The following sections provide a rigorous overview of key principles, algorithmic strategies, representative models, evaluation metrics, and practical considerations in direct registration of raw points.

1. Foundational Principles of Direct Registration

Direct registration methods operate on raw point sets $X = \{x_i\}$ and $Y = \{y_j\}$ , aiming to find the optimal spatial transformation (typically rigid: $(R, t) \in SE(3)$ ) that best aligns $X$ to $Y$ in the absence of pre-established correspondences. This category departs from classic Iterative Closest Point (ICP) approaches by:

Eschewing correspondence search in favor of global or descriptor-space optimization (Cheng et al., 31 Jan 2025, Zhang et al., 2024, Deng et al., 2021).
Integrating feature extraction and pose estimation, permitting either functional representation (e.g., via RKHS (Zhang et al., 2024)) or direct optimization over geometric loss functions.
Leveraging learning-based pipelines to extract equivariant features or dense descriptors on-the-fly, enabling registration even amidst noise, occlusion, or partial overlap (Deng et al., 2019, Dang et al., 2022, Huang et al., 2021).
Formulating the registration objective as similarity maximization (e.g., minimizing $d(X, Y) = \sum_{x_i \in X, y_j \in Y} L(x_i, y_j)$ for some loss $L$ ) or function space minimization (e.g., RKHS norm (Zhang et al., 2024)).

The central tenet is joint (and often unsupervised) optimization that circumvents explicit pairing, providing robustness against ambiguity, noise, and data asymmetry.

2. Algorithmic Strategies and Mathematical Formulation

Direct approaches exhibit diverse algorithmic architectures—ranging from correspondence-free search, equivariant learning, to graph signal-based alignment. Representative strategies include:

Semi-Exhaustive Search: DSES iterates over rotations $R$ (sampled from $SO(3)$ ), computing for each a translation $t^*$ by mode estimation across all paired displacements, and evaluates an inlier-maximizing objective (Cheng et al., 31 Jan 2025):

$t^* = \mathrm{mode} \left( \{ y_j - R x_i \mid x_i \in X,\ y_j \in Y \} \right)$

The solution is selected by minimizing a metric over the top candidate poses.

RKHS Functional Optimization: Equivariant encoding maps points to steerable feature vectors $\phi(x)$ so that the cloud is represented as

$f_X(\cdot) = \sum_{x_i \in X} l_X(x_i)\, k(\cdot, x_i)$

Registration minimizes the RKHS distance:

$d(f_X, f_{h(Z)}) = \| f_X - f_{h(Z)} \|_H^2$

with transformation $h=(R,t)$ applying equivariantly via $h\,{\cdot}\,(x \oplus f) = (R x + t) \oplus (R f)$ (Zhang et al., 2024).

Descriptor and Hypothesize-Verify Pipelines: Dual-branch networks extract both pose-invariant and pose-variant local descriptors; their difference encodes relative pose cues. Each correspondence generates a full pose hypothesis, which is verified by evaluating alignment quality and selecting the best (Deng et al., 2019).
Graph Signal Processing: Keypoints and point characteristics (e.g., normals, curvatures) are encoded via Haar-like graph filters; matching and outlier rejection use robust statistics, and global optimization comes via adaptive simulated annealing over a physically inspired dynamics (Mingyang et al., 2023).
Optimal Transport Matching: Keypoints and their descriptors are matched via doubly stochastic assignment (Sinkhorn algorithm) over dense attention graphs, with loss functions designed to maximize match margins (Dang et al., 2022).
Generative and GAN-based Models: Generators produce aligned point clouds, and downstream modules (e.g., parallel sample consensus) extract transformations from correspondences between input and generated clouds, with adversarial loss enforcing distributional matching (Huang et al., 2021).

Each approach is grounded in explicit mathematical objectives and enables evaluation or solution without relying on iterative matching cycles.

3. Feature Representation and Equivariance

Critical to direct approaches is the use of robust, discriminative, and equivariant feature representations.

Pose-Invariant vs. Pose-Variant Features: Architectures such as PPF-FoldNet (pose-invariant via point pair features) and PC-FoldNet (pose-variant via raw coordinates) facilitate decoupling of geometry and pose, with their descriptor difference isolating transformation cues (Deng et al., 2019).
Equivariant Encoding: Direct sum embedding (coordinate + steerable feature) allows equivariant transformation under $SE(3)$ group actions, ensuring learned representations commute with the registration transformation (Zhang et al., 2024).
Graph Attention and Dense Descriptor Aggregation: Attention mechanisms (self, cross, surface-aware) enrich local and contextual relationships, supporting invariance to density and facilitating robust matching (Trappolini et al., 2021, Dang et al., 2022, Gupta et al., 2023).
Physics-informed Features: Point response intensities and graph gradients encode geometric invariants, guiding simulated particle dynamics for registration (Mingyang et al., 2023).

Feature design thus enables bypassing explicit reference frames and yields direct, data-driven predictions for pose estimation.

4. Robustness, Speed, and Scalability

Direct registration methodologies are engineered to maximize robustness and computational efficiency:

Runtime Optimization: Hierarchical filtering (e.g., inlier count pre-selection), mode-finding for translation, and massive GPU parallelization yield real-time or near-real-time performance on large ( $\sim$ 250K) raw clouds (Cheng et al., 31 Jan 2025, Ali et al., 2021).
Robustness to Outliers and Partial Overlap: Kernel-induced metrics (e.g., RKHS), global alignment criteria (e.g., line intersection metric (Deng et al., 2021)), and outlier rejection via statistics (MAD, X84 principle (Mingyang et al., 2023)) confer pronounced robustness.
Generalization: Unsupervised and correspondence-free models exhibit domain-agnostic behavior, avoiding pitfalls of overfitting or sensitivity to partial data. Mode-based voting and loss minimization techniques are resilient across object categories and sensor conditions.
Scalability: Tree-based representations (e.g., Barnes–Hut octree (Ali et al., 2021)), optimal transport, and efficient point selection (compact keypoint sets) enable scaling to large datasets encountered in robotics, SLAM, and autonomous mapping.

Collectively, data show speedups (often 20×), generalized effectiveness, and superior error metrics compared to classic ICP or learning-based correspondence/prediction paradigms.

5. Quantitative Evaluation and Performance

Experimental assessment on benchmarks such as ModelNet40, KITTI, 3DMatch, and challenging real-world robotics scenarios demonstrate the empirical validity of direct registration techniques.

Error Metrics: Mean Isotropic Error (MIE) and Mean Absolute Error (MAE) for rotation (in degrees) and translation (meters) (Cheng et al., 31 Jan 2025, Zhang et al., 2024); Chamfer, geodesic, and Euclidean distance measures for both rigid and nonrigid registration (Trappolini et al., 2021, Huang et al., 2021).
Recall/Success Ratio: Percentage of registration tasks with errors below specified thresholds (e.g., $<1^\circ$ rotation, $<0.1$ m translation) (Ali et al., 2021, Dang et al., 2022).
Inlier Ratios, Robustness to Partial Data: Metrics such as registration recall and inlier ratio (IR) indicate robustness under sparse, noisy, or occluded inputs (Wang et al., 2023).

Results consistently indicate that direct methods match or outperform state-of-the-art baselines (RPM-Net, ICP, FPFH-based, DCP-v2, Predator, etc.) with superior recall, lower error, and reduced computational load.

6. Application Domains and Practical Considerations

Direct registration frameworks have demonstrable impact in several domains:

Robotics and SLAM: Real-time pose correction (e.g., for robotic arms), LiDAR odometry, and mapping under low-cost inertial sensing, even during GNSS outages (Brun et al., 2022, Cheng et al., 31 Jan 2025).
Medical Imaging: Multi-view simultaneous registration of ultrasound (TEE) images where salient features are lacking (Mao et al., 2021).
3D Reconstruction and AR: Nonrigid registration leveraging transformer architectures, beneficial for texture transfer, model fusion, and scene interpolation (Trappolini et al., 2021).
Cross-Modal Registration: Direct image-to-point cloud registration via modality-bridging diffusion models obviates the need for costly metric learning (Wang et al., 2023).
Scale, Noise, and Density Adaptivity: Tree representations and density-adaptive matching enable processing of highly non-uniform, large-scale datasets typical in autonomous navigation.

These methods often provide publicly accessible codebases, facilitating reproducibility and extension in both academic and industrial research.

7. Comparative Analysis and Limitations

Direct registration is distinguished from both classic iterative and modern learning-based approaches in several respects:

Methodology	Data Requirement	Correspondence Use	Speed/Robustness
ICP	None	Explicit/Iterative	Limited w/ outliers; slow
Learning-based Corr	Labeled; GT	Implicit	Good on train domain
Direct Methods*	None or Unsuperv	None or Latent	High; robust to noise

*Direct Methods: (Cheng et al., 31 Jan 2025, Zhang et al., 2024, Deng et al., 2019, Dang et al., 2022, Mingyang et al., 2023).

Limitations include sensitivity to discretization in semi-exhaustive search, efficiency tradeoffs in global registration without initial pose bounds, and potential challenges—as reported—for very strong rotations if local descriptors are not sufficiently invariant (Brun et al., 2022).

Direct registration of raw points embodies a spectrum of mathematically grounded and computationally efficient paradigms for point cloud alignment, encompassing correspondence-free optimization, equivariant deep learning, graph-based representation, and generative modeling. The empirical success, theoretical robustness, and broad applicability of these strategies position them as fundamental tools in both foundational and applied 3D vision research.