GAFAR: Graph-Attention Feature-Augmentation for Registration A Fast and Light-weight Point Set Registration Algorithm (2307.02339v1)

Published 5 Jul 2023 in cs.CV

Abstract: Rigid registration of point clouds is a fundamental problem in computer vision with many applications from 3D scene reconstruction to geometry capture and robotics. If a suitable initial registration is available, conventional methods like ICP and its many variants can provide adequate solutions. In absence of a suitable initialization and in the presence of a high outlier rate or in the case of small overlap though the task of rigid registration still presents great challenges. The advent of deep learning in computer vision has brought new drive to research on this topic, since it provides the possibility to learn expressive feature-representations and provide one-shot estimates instead of depending on time-consuming iterations of conventional robust methods. Yet, the rotation and permutation invariant nature of point clouds poses its own challenges to deep learning, resulting in loss of performance and low generalization capability due to sensitivity to outliers and characteristics of 3D scans not present during network training. In this work, we present a novel fast and light-weight network architecture using the attention mechanism to augment point descriptors at inference time to optimally suit the registration task of the specific point clouds it is presented with. Employing a fully-connected graph both within and between point clouds lets the network reason about the importance and reliability of points for registration, making our approach robust to outliers, low overlap and unseen data. We test the performance of our registration algorithm on different registration and generalization tasks and provide information on runtime and resource consumption. The code and trained weights are available at https://github.com/mordecaimalignatius/GAFAR/.

References (44)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces GAFAR, a novel deep learning method for rigid point cloud registration that leverages graph-attention for dynamic feature augmentation.
GAFAR achieves state-of-the-art performance and superior generalization by improving robustness to outliers, low overlap, and poor initializations while remaining lightweight.
The method includes an online feature augmentation strategy and a mechanism to estimate registration success without ground truth, making it suitable for challenging and fail-safe applications.

This paper introduces GAFAR (Graph-Attention Feature-Augmentation for Registration), a novel deep learning approach for rigid point cloud registration. The method addresses the challenges of registering point clouds in the absence of good initializations, in the presence of high outlier rates, or with low overlap, while remaining computationally efficient for mobile applications.

Problem Addressed: The paper focuses on the fundamental computer vision problem of rigid point cloud registration, which involves finding the rotation and translation to align two point sets. Traditional methods like ICP often fail in challenging scenarios with poor initial alignments, significant outliers, or limited overlap. Deep learning offers potential for one-shot registration using learned features but faces difficulties due to the inherent invariance to rotation and permutation of point clouds, as well as generalization issues stemming from sensitivity to outliers and dataset-specific characteristics.

Proposed Solution (GAFAR): GAFAR leverages the attention mechanism within a lightweight network architecture to dynamically augment point descriptors during inference, tailoring them to the specific point clouds being registered. The architecture uses a fully-connected graph structure, both within and between point clouds, to enable reasoning about the importance and reliability of individual points. This results in robustness to outliers, low overlap, and improved generalization to unseen data.

Key Components:

Feature Head: A feature extraction module that generates initial per-point feature descriptors for both source and reference point clouds independently. It uses a DGCNN-inspired architecture with a local feature encoder and a point-wise location encoder (MLP). The local feature encoder captures geometric information in a local neighborhood, while the point-wise location encoder embeds the point's 3D position.
Graph-Attention Feature-Augmentation Network: The core of the method, this network refines the initial feature descriptors using interleaved self- and cross-attention layers. Self-attention allows the network to reason about relationships between points within the same point cloud, while cross-attention allows it to incorporate information from the other point cloud. This adaptive augmentation network transforms the local features for robust matching.
Feature Matching and Correspondence Estimation: The augmented feature descriptors are compared using dot-product similarity to create a similarity score matrix. This matrix is interpreted as the cost in an optimal transport problem. The Sinkhorn algorithm is applied to find an approximate solution, producing a permutation matrix that indicates point correspondences. A threshold is applied to this matrix, and mutual row- and column-wise maxima are taken as final correspondences.
Rigid Transformation Recovery: The point correspondences are used in conjunction with Singular Value Decomposition (SVD) to recover the rigid transformation (rotation and translation) that aligns the point clouds.

Contributions:

Demonstrates the effective use of transformer networks and the attention mechanism for fast and accurate point cloud registration.
Presents an online feature augmentation strategy that significantly improves robustness to partial overlap and unseen geometries.
Introduces a mechanism for estimating registration success without ground truth information, enabling its use in fail-safe applications. The method employs the matching score between points and number of found matches to achieve this.
Achieves state-of-the-art performance and superior generalization ability while maintaining a lightweight implementation.

Experiments and Results:

ModelNet40 Experiments: Evaluates registration performance on synthetic data from the ModelNet40 dataset. Experiments include registration with clean data, additive Gaussian noise, partial overlap, and unseen object categories. The method demonstrates strong performance, particularly in challenging scenarios with noise, partial overlap, and when generalizing to unseen object categories.
Real-World 3D Scan Experiments: Tests generalization ability using LiDAR data from the KITTI dataset and a custom dataset of high-quality real-world object scans captured with a handheld 3D scanner. The results show good generalization performance, indicating the method's ability to handle different data modalities and geometries.
Ablation Study: The ablation paper shows that each additional component to the feature head improves the performance, and the method performs the best when the local point feature and positional encoding are combined with an MLP for feature fusion.
Resource Consumption Analysis: Analyzes the computational cost of GAFAR in terms of the number of trainable parameters, GPU memory usage, and registration speed. The method is shown to be lightweight and fast, making it suitable for resource-constrained applications.

Conclusion:

GAFAR is presented as a promising approach for rigid point cloud registration, offering a balance between accuracy, robustness, and computational efficiency. The online feature augmentation strategy, the ability to estimate registration success, and the strong generalization performance make it a valuable tool for various computer vision and robotics applications. The authors highlight the potential for future work to address the limitations regarding the size of point clouds that can be registered.

PDF Markdown

GAFAR: Graph-Attention Feature-Augmentation for Registration A Fast and Light-weight Point Set Registration Algorithm (2307.02339v1)

Summary

Related Papers

GitHub