SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation (2307.00306v1)
Abstract: Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.
- Y. Xiang et al., “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” RSS, 2018.
- Y. He et al., “PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation,” in CVPR, 2020, pp. 11 632–11 641.
- ——, “FFB6D: A full flow bidirectional fusion network for 6D pose estimation,” in CVPR, 2021, pp. 3003–3013.
- J. Ku et al., “Joint 3D proposal generation and object detection from view aggregation,” in IROS. IEEE, 2018, pp. 1–8.
- B. Gu et al., “ECPC-ICP: A 6D vehicle pose estimation method by fusing the roadside lidar point cloud and road feature,” Sensors, vol. 21, no. 10, p. 3489, 2021.
- E. Marchand, H. Uchiyama, and F. Spindler, “Pose estimation for augmented reality: A hands-on survey,” IEEE Trans. Visual. Comput. Graphics, vol. 22, no. 12, pp. 2633–2651, 2015.
- Y. Su et al., “Deep multi-state object pose estimation for augmented reality assembly,” in ISMAR-Adjunct. IEEE, 2019, pp. 222–227.
- Y. Di et al., “SO-Pose: Exploiting self-occlusion for direct 6D pose estimation,” in CVPR, 2021, pp. 12 396–12 405.
- Y. Su et al., “ZebraPose: Coarse to fine surface encoding for 6DoF object pose estimation,” in CVPR, 2022, pp. 6738–6748.
- F. Hagelskjær and A. G. Buch, “PointVoteNet: Accurate object detection and 6 DoF pose estimation in point clouds,” in ICIP. IEEE, 2020, pp. 2641–2645.
- D.-C. Hoang, J. A. Stork, and T. Stoyanov, “Voting and attention-based pose relation learning for object pose estimation from 3D point clouds,” RA-L, vol. 7, no. 4, pp. 8980–8987, 2022.
- C. Wang et al., “DenseFusion: 6D object pose estimation by iterative dense fusion,” in CVPR, 2019, pp. 3343–3352.
- A. Zeng et al., “Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge,” in ICRA. IEEE, 2017, pp. 1386–1383.
- C. Li, J. Bai, and G. D. Hager, “A unified framework for multi-view multi-class object pose estimation,” in ECCV, 2018, pp. 254–269.
- Y. Labbé et al., “CosyPose: Consistent multi-view multi-object 6D pose estimation,” in ECCV, 2020, pp. 574–591.
- D. G. Lowe, “Object recognition from local scale-invariant features,” in ICCV, vol. 2, 1999, pp. 1150–1157.
- ——, “Distinctive image features from scale-invariant keypoints,” IJCV, vol. 60, no. 2, pp. 91–110, 2004.
- E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in ECCV, 2006, pp. 430–443.
- A. Collet, M. Martinez, and S. S. Srinivasa, “The MOPED framework: Object recognition and pose estimation for manipulation,” IJRR, vol. 30, no. 10, pp. 1284–1306, 2011.
- A. Collet et al., “Object recognition and full pose registration from a single image for robotic manipulation,” in ICRA. IEEE, 2009.
- S. Peng et al., “PVNet: Pixel-wise voting network for 6DoF pose estimation,” in CVPR, 2019, pp. 4561–4570.
- D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, no. 9, pp. 850–863, 1993.
- C. Gu and X. Ren, “Discriminative mixture-of-templates for viewpoint classification,” in ECCV, 2010, pp. 408–421.
- S. Hinterstoisser et al., “Gradient response maps for real-time detection of textureless objects,” IEEE Trans. Pattern Anal. Machine Intell., vol. 34, no. 5, pp. 876–888, 2011.
- Z. Cao, Y. Sheikh, and N. K. Banerjee, “Real-time scalable 6DOF pose estimation for textureless objects,” in ICRA. IEEE, 2016, pp. 2441–2448.
- M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
- S. Tulsiani and J. Malik, “Viewpoints and keypoints,” in CVPR, 2015, pp. 1510–1519.
- Y. Li et al., “DeepIM: Deep iterative matching for 6D pose estimation,” in ECCV, 2018, pp. 683–698.
- K. Gupta, L. Petersson, and R. Hartley, “CullNet: Calibrated and pose aware confidence scores for object pose estimation,” in ICCVW, 2019.
- W. Kehl et al., “SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again,” in ICCV, 2017, pp. 1521–1529.
- B. Tekin, S. N. Sinha, and P. Fua, “Real-time seamless single shot 6D object pose prediction,” in CVPR, 2018, pp. 292–301.
- G. Wang et al., “GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation,” in CVPR, 2021.
- J. Chen et al., “Survey on 6D pose estimation of rigid object,” in CCC. IEEE, 2020, pp. 7440–7445.
- D. Fernandes et al., “Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy,” Information Fusion, vol. 68, pp. 161–191, 2021.
- S. Song and J. Xiao, “Sliding shapes for 3D object detection in depth images,” in ECCV, 2014, pp. 634–651.
- B. Li, “3D fully convolutional network for vehicle detection in point cloud,” in IROS. IEEE, 2017, pp. 1513–1518.
- C. R. Qi et al., “PointNet: Deep learning on point sets for 3D classification and segmentation,” in CVPR, Jul. 2017, pp. 652–660.
- Y. Zhou and O. Tuzel, “VoxelNet: End-to-end learning for point cloud based 3D object detection,” in CVPR, 2018, pp. 4490–4499.
- Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, Oct. 2018.
- A. H. Lang et al., “PointPillars: Fast encoders for object detection from point clouds,” in CVPR, 2019, pp. 12 697–12 705.
- C. R. Qi et al., “Deep hough voting for 3D object detection in point clouds,” in CVPR, 2019, pp. 9277–9286.
- Q. Xie et al., “VENet: Voting enhancement network for 3D object detection,” in CVPR, 2021, pp. 3712–3721.
- X. Chen et al., “Multi-view 3D object detection network for autonomous driving,” in CVPR, 2017, pp. 1907–1915.
- D. Xu, D. Anguelov, and A. Jain, “PointFusion: Deep sensor fusion for 3D bounding box estimation,” in CVPR, 2018, pp. 244–253.
- K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Machine Intell., no. 5, pp. 698–700, 1987.
- B. Triggs et al., “Bundle adjustment – a modern synthesis,” in Vision Algorithms: Theory and Practice. Springer, 2000, pp. 298–372.
- F. Duffhauss, T. Demmler, and G. Neumann, “MV6D: Multi-view 6D pose estimation on RGB-D frames using a deep point-wise voting network,” in IROS. IEEE, 2022.
- G. Pitteri et al., “On object symmetries and 6D pose estimation from images,” in 3DV. IEEE, 2019, pp. 614–622.
- M. Rad and V. Lepetit, “BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth,” in ICCV, 2017, pp. 3828–3836.
- K. Park, T. Patten, and M. Vincze, “Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation,” in ICCV, 2019.
- T. Hodan, D. Barath, and J. Matas, “EPOS: Estimating 6D pose of objects with symmetries,” in CVPR, 2020, pp. 11 703–11 712.
- H. Zhang et al., “Symmetry-aware 6D object pose estimation via multitask learning,” Complexity, 2020.
- J. Richter-Klug and U. Frese, “Handling object symmetries in CNN-based pose estimation,” in ICRA. IEEE, 2021, pp. 13 850–13 856.
- N. Mo et al., “ES6D: A computation efficient and symmetry-aware 6D pose regression framework,” in CVPR, 2022, pp. 6718–6727.
- Q. Hu et al., “RandLA-Net: Efficient semantic segmentation of large-scale point clouds,” in CVPR, 2020, pp. 11 108–11 117.
- K. He et al., “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- J. Deng et al., “ImageNet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
- H. Zhao et al., “Pyramid scene parsing network,” in CVPR, 2017, pp. 2881–2890.
- Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 8, pp. 790–799, 1995.
- I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in ICLR, 2017.
- S. Hinterstoisser et al., “Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes,” in ACCV. Springer, 2012, pp. 548–562.
- T. Lin et al., “Focal loss for dense object detection,” in ICCV, Oct. 2017, pp. 2999–3007.
- B. Calli et al., “The YCB object and model set: Towards common benchmarks for manipulation research,” in ICAR. IEEE, 2015.