Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GAFAR: Graph-Attention Feature-Augmentation for Registration A Fast and Light-weight Point Set Registration Algorithm (2307.02339v1)

Published 5 Jul 2023 in cs.CV

Abstract: Rigid registration of point clouds is a fundamental problem in computer vision with many applications from 3D scene reconstruction to geometry capture and robotics. If a suitable initial registration is available, conventional methods like ICP and its many variants can provide adequate solutions. In absence of a suitable initialization and in the presence of a high outlier rate or in the case of small overlap though the task of rigid registration still presents great challenges. The advent of deep learning in computer vision has brought new drive to research on this topic, since it provides the possibility to learn expressive feature-representations and provide one-shot estimates instead of depending on time-consuming iterations of conventional robust methods. Yet, the rotation and permutation invariant nature of point clouds poses its own challenges to deep learning, resulting in loss of performance and low generalization capability due to sensitivity to outliers and characteristics of 3D scans not present during network training. In this work, we present a novel fast and light-weight network architecture using the attention mechanism to augment point descriptors at inference time to optimally suit the registration task of the specific point clouds it is presented with. Employing a fully-connected graph both within and between point clouds lets the network reason about the importance and reliability of points for registration, making our approach robust to outliers, low overlap and unseen data. We test the performance of our registration algorithm on different registration and generalization tasks and provide information on runtime and resource consumption. The code and trained weights are available at https://github.com/mordecaimalignatius/GAFAR/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. H. Li and R. Hartley, “The 3D-3D Registration Problem Revisited,” in Proc. IEEE Intl. Conf. on Computer Vision (ICCV2007), 2007, pp. 1–8.
  2. A. Hietanen, J. Latokartano, A. Foi, R. Pieters, V. Kyrki, M. Lanz, and J.-K. Kämäräinen, “Object Pose Estimation in Robotics Revisited,” 2020.
  3. J. Kim, M. Muramatsu, Y. Murata, and Y. Suga, “Omnidirectional vision-based ego-pose estimation for an autonomous in-pipe mobile robot,” Advanced Robotics, vol. 21, no. 3-4, pp. 441–460, 2007.
  4. S. Thrun and J. J. Leonard, “Simultaneous Localization and Mapping,” Springer Handbook of Robotics, pp. 871–889, 2008.
  5. J.-E. Deschaud, “IMLS-SLAM: Scan-to-Model Matching Based on 3D Data,” in Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA2018), 2018, pp. 2480–2485.
  6. K. Fischer, M. Simon, S. Milz, and P. Mäder, “StickyLocalization: Robust End-To-End Relocalization on Point Clouds using Graph Neural Networks,” in IEEE/CVF Winter Conf. on Applications of Computer Vision (WACV2022), 2022, pp. 307–316.
  7. F. Pomerleau, F. Colas, and R. Siegwart, “A Review of Point Cloud Registration Algorithms for Mobile Robotics,” Foundations and Trends® in Robotics, vol. 4, pp. 1–104, 05 2015.
  8. F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat, “Comparing ICP Variants on Real-World Data Sets,” Autonomous Robots, vol. 34, no. 3, pp. 133–148, apr 2013.
  9. J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI2016), 2016.
  10. M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Communications of the ACM, vol. 24, no. 6, p. 381–395, 1981.
  11. I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1–38, 2010.
  12. C. Choy, J. Park, and V. Koltun, “Fully Convolutional Geometric Features,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV2019), 2019, pp. 8957–8965.
  13. H. Deng, T. Birdal, and S. Ilic, “PPFNet: Global Context Aware Local Features for Robust 3D Point Matching,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition(CVPR2018), 2018, pp. 195–205.
  14. Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey, “PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2019), 2019, pp. 7156–7165.
  15. Y. Wang and J. Solomon, “Deep Closest Point: Learning Representations for Point Cloud Registration,” in Proc. IEEE/CVF Intl. Conf. on Computer Vision (ICCV2019), 2019, pp. 3522–3531.
  16. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All You Need,” in Proc. Intl. Conf. on Neural Information Processing Systems, (NIPS2017), ser. NIPS’17, 2017, p. 6000–6010.
  17. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning Feature Matching with Graph Neural Networks,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2020), 2020.
  18. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2012), 2012, pp. 3354–3361.
  19. S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” in Proc. Intl. Conf. on 3-D Digital Imaging and Modeling, 2001, pp. 145–152.
  20. Q.-Y. Zhou, , J. Park, and V. Koltun, “Fast Global Registration,” in Proc. European Conf. on Computer Vision (ECCV2016), 2016, pp. 766–782.
  21. S. Hinterstoisser, V. Lepetit, N. Rajkumar, and K. Konolige, “Going Further with Point Pair Features,” in Proc. European Conf. on Computer Vision (ECCV2016, 2016, pp. 834–848.
  22. R. B. Rusu, N. Blodow, and M. Beetz, “Fast Point Feature Histograms (FPFH) for 3D registration,” in Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA2009), 2009, pp. 3212–3217.
  23. Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, “A Comprehensive Performance Evaluation of 3D Local Feature Descriptors,” Intl. Journal of Computer Vision, vol. 116, no. 1, pp. 66–89, 2016.
  24. R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2017), 2017, pp. 77–85.
  25. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Advances in Neural Information Processing Systems (NIPS2017), vol. 30, 2017.
  26. M. Saleh, S. Dehghani, B. Busam, N. Navab, and F. Tombari, “Graphite: Graph-Induced Feature Extraction for Point Cloud Registration,” in Proc. Intl. Conf. on 3D Vision (3DV2020), 2020, pp. 241–251.
  27. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic Graph CNN for Learning on Point Clouds,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2019), 2019.
  28. C. Choy, W. Dong, and V. Koltun, “Deep Global Registration,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2020), 2020, pp. 2511–2520.
  29. Z. Gojcic, C. Zhou, J. D. Wegner, and W. Andreas, “The Perfect Match: 3D Point Cloud Matching with Smoothed Densities,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2019), 2019.
  30. K. Fischer, M. Simon, F. Olsner, S. Milz, H.-M. Gross, and P. Mader, “StickyPillars: Robust and Efficient Feature Matching on Point Clouds Using Graph Neural Networks,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2021), June 2021, pp. 313–323.
  31. S. Huang, Z. Gojcic, M. Usvyatsov, A. Wieser, and K. Schindler, “Predator: Registration of 3D Point Clouds With Low Overlap,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2021), June 2021, pp. 4267–4276.
  32. Z. Qin, H. Yu, C. Wang, Y. Guo, Y. Peng, and K. Xu, “Geometric Transformer for Fast and Robust Point Cloud Registration,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR2022), June 2022, pp. 11 143–11 152.
  33. J. Li, C. Zhang, Z. Xu, H. Zhou, and C. Zhang, “Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration,” in Proc. European Conf. on Computer Vision (ECCV2020), 2020, pp. 378–394.
  34. W. Yuan, B. Eckart, K. Kim, V. Jampani, D. Fox, and J. Kautz, “DeepGMR: Learning Latent Gaussian Mixture Models for Registration,” in Proc. European Conf. on Computer Vision (ECCV2020), 2020, pp. 733–750.
  35. Z. J. Yew and G. H. Lee, “RPM-Net: Robust Point Matching using Learned Features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2020), 2020.
  36. M. Cuturi, “Sinkhorn Distances: Lightspeed Computation of Optimal Transport,” in Proc. Intl. Conf. on Neural Information Processing Systems (NIPS2013).   Red Hook, NY, USA: Curran Associates Inc., 2013, p. 2292–2300.
  37. K. Fu, J. Luo, X. Luo, S. Liu, C. Zhang, and M. Wang, “Robust Point Cloud Registration Framework Based on Deep Graph Matching,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2021), 2021.
  38. H. W. Kuhn, “The Hungarian Method for the assignment problem,” Naval Research Logistics Quarterly, pp. 83–97, 1955.
  39. G. Peyré and M. Cuturi, “Computational Optimal Transport: With Applications to Data Science,” Found. Trends Mach. Learn., vol. 11, no. 5–6, pp. 355–607, 2019.
  40. D. Chicco, “Siamese Neural Networks: An Overview,” Artificial Neural Networks, pp. 73–94, 2021.
  41. R. Sinkhorn and P. Knopp, “Concerning Nonnegative Matrices And Doubly Stochastic Matrices,” Pacific Journal of Mathematics, vol. 21, no. 2, 1967.
  42. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for Volumetric Shapes,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR2015), 2015.
  43. (2023, april) ModelNet40 2048 pre-sampled. [Online]. Available: https://shapenet.cs.stanford.edu/media/modelnet40_ply_hdf5_2048.zip
  44. (2023, april) Artec Leo. [Online]. Available: https://www.artec3d.com/portable-3d-scanners/artec-leo
Citations (1)

Summary

  • The paper introduces GAFAR, a novel deep learning method for rigid point cloud registration that leverages graph-attention for dynamic feature augmentation.
  • GAFAR achieves state-of-the-art performance and superior generalization by improving robustness to outliers, low overlap, and poor initializations while remaining lightweight.
  • The method includes an online feature augmentation strategy and a mechanism to estimate registration success without ground truth, making it suitable for challenging and fail-safe applications.

This paper introduces GAFAR (Graph-Attention Feature-Augmentation for Registration), a novel deep learning approach for rigid point cloud registration. The method addresses the challenges of registering point clouds in the absence of good initializations, in the presence of high outlier rates, or with low overlap, while remaining computationally efficient for mobile applications.

Problem Addressed: The paper focuses on the fundamental computer vision problem of rigid point cloud registration, which involves finding the rotation and translation to align two point sets. Traditional methods like ICP often fail in challenging scenarios with poor initial alignments, significant outliers, or limited overlap. Deep learning offers potential for one-shot registration using learned features but faces difficulties due to the inherent invariance to rotation and permutation of point clouds, as well as generalization issues stemming from sensitivity to outliers and dataset-specific characteristics.

Proposed Solution (GAFAR): GAFAR leverages the attention mechanism within a lightweight network architecture to dynamically augment point descriptors during inference, tailoring them to the specific point clouds being registered. The architecture uses a fully-connected graph structure, both within and between point clouds, to enable reasoning about the importance and reliability of individual points. This results in robustness to outliers, low overlap, and improved generalization to unseen data.

Key Components:

  • Feature Head: A feature extraction module that generates initial per-point feature descriptors for both source and reference point clouds independently. It uses a DGCNN-inspired architecture with a local feature encoder and a point-wise location encoder (MLP). The local feature encoder captures geometric information in a local neighborhood, while the point-wise location encoder embeds the point's 3D position.
  • Graph-Attention Feature-Augmentation Network: The core of the method, this network refines the initial feature descriptors using interleaved self- and cross-attention layers. Self-attention allows the network to reason about relationships between points within the same point cloud, while cross-attention allows it to incorporate information from the other point cloud. This adaptive augmentation network transforms the local features for robust matching.
  • Feature Matching and Correspondence Estimation: The augmented feature descriptors are compared using dot-product similarity to create a similarity score matrix. This matrix is interpreted as the cost in an optimal transport problem. The Sinkhorn algorithm is applied to find an approximate solution, producing a permutation matrix that indicates point correspondences. A threshold is applied to this matrix, and mutual row- and column-wise maxima are taken as final correspondences.
  • Rigid Transformation Recovery: The point correspondences are used in conjunction with Singular Value Decomposition (SVD) to recover the rigid transformation (rotation and translation) that aligns the point clouds.

Contributions:

  • Demonstrates the effective use of transformer networks and the attention mechanism for fast and accurate point cloud registration.
  • Presents an online feature augmentation strategy that significantly improves robustness to partial overlap and unseen geometries.
  • Introduces a mechanism for estimating registration success without ground truth information, enabling its use in fail-safe applications. The method employs the matching score between points and number of found matches to achieve this.
  • Achieves state-of-the-art performance and superior generalization ability while maintaining a lightweight implementation.

Experiments and Results:

  • ModelNet40 Experiments: Evaluates registration performance on synthetic data from the ModelNet40 dataset. Experiments include registration with clean data, additive Gaussian noise, partial overlap, and unseen object categories. The method demonstrates strong performance, particularly in challenging scenarios with noise, partial overlap, and when generalizing to unseen object categories.
  • Real-World 3D Scan Experiments: Tests generalization ability using LiDAR data from the KITTI dataset and a custom dataset of high-quality real-world object scans captured with a handheld 3D scanner. The results show good generalization performance, indicating the method's ability to handle different data modalities and geometries.
  • Ablation Study: The ablation paper shows that each additional component to the feature head improves the performance, and the method performs the best when the local point feature and positional encoding are combined with an MLP for feature fusion.
  • Resource Consumption Analysis: Analyzes the computational cost of GAFAR in terms of the number of trainable parameters, GPU memory usage, and registration speed. The method is shown to be lightweight and fast, making it suitable for resource-constrained applications.

Conclusion:

GAFAR is presented as a promising approach for rigid point cloud registration, offering a balance between accuracy, robustness, and computational efficiency. The online feature augmentation strategy, the ability to estimate registration success, and the strong generalization performance make it a valuable tool for various computer vision and robotics applications. The authors highlight the potential for future work to address the limitations regarding the size of point clouds that can be registered.