Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving (2309.11011v2)

Published 20 Sep 2023 in cs.RO

Abstract: Visual Odometry (VO) plays a pivotal role in autonomous systems, with a principal challenge being the lack of depth information in camera images. This paper introduces OCC-VO, a novel framework that capitalizes on recent advances in deep learning to transform 2D camera images into 3D semantic occupancy, thereby circumventing the traditional need for concurrent estimation of ego poses and landmark locations. Within this framework, we utilize the TPV-Former to convert surround view cameras' images into 3D semantic occupancy. Addressing the challenges presented by this transformation, we have specifically tailored a pose estimation and mapping algorithm that incorporates Semantic Label Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to construct a comprehensive map. Our implementation is open-sourced and available at: https://github.com/USTCLH/OCC-VO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. W. Chen, G. Shang, A. Ji, C. Zhou, X. Wang, C. Xu, Z. Li, and K. Hu, “An overview on visual slam: From tradition to semantic,” Remote Sensing, vol. 14, no. 13, p. 3010, 2022.
  2. B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern synthesis,” in Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22, 1999 Proceedings.   Springer, 2000, pp. 298–372.
  3. A. Macario Barros, M. Michel, Y. Moline, G. Corre, and F. Carrel, “A comprehensive survey of visual slam algorithms,” Robotics, vol. 11, no. 1, p. 24, 2022.
  4. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics (T-RO), vol. 31, no. 5, pp. 1147–1163, 2015.
  5. L. Roldao, R. De Charette, and A. Verroust-Blondet, “3d semantic scene completion: A survey,” International Journal of Computer Vision (IJCV), vol. 130, no. 8, pp. 1978–2005, 2022.
  6. Y. Huang, W. Zheng, Y. Zhang, J. Zhou, and J. Lu, “Tri-perspective view for vision-based 3d semantic occupancy prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 9223–9232.
  7. A. Segal, D. Haehnel, and S. Thrun, “Generalized-icp.” in Robotics: science and systems, vol. 2, no. 4.   Seattle, WA, 2009, p. 435.
  8. J. Park, Q.-Y. Zhou, and V. Koltun, “Colored point cloud registration revisited,” in Proceedings of the IEEE international conference on computer vision (ICCV), 2017, pp. 143–152.
  9. Y. Duan, J. Peng, Y. Zhang, J. Ji, and Y. Zhang, “Pfilter: Building persistent maps through feature filtering for fast and accurate lidar-based slam,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 11 087–11 093.
  10. X. Tian, T. Jiang, L. Yun, Y. Mao, H. Yang, Y. Wang, Y. Wang, and H. Zhao, “Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving,” Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024.
  11. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020, pp. 11 621–11 631.
  12. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics (T-RO), vol. 37, no. 6, pp. 1874–1890, 2021.
  13. R. Qian, X. Lai, and X. Li, “3d object detection for autonomous driving: A survey,” Pattern Recognition, vol. 130, p. 108796, 2022.
  14. C. B. Rist, D. Emmerichs, M. Enzweiler, and D. M. Gavrila, “Semantic scene completion using local deep implicit functions on lidar data,” IEEE transactions on pattern analysis and machine intelligence (T-PAMI), vol. 44, no. 10, pp. 7205–7218, 2021.
  15. M. Zhong and G. Zeng, “Semantic point completion network for 3d semantic scene completion,” in European Conference on Artificial Intelligence (ECAI).   IOS Press, 2020, pp. 2824–2831.
  16. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 1746–1754.
  17. M. Garbade, Y.-T. Chen, J. Sawatzky, and J. Gall, “Two stream 3d semantic scene completion,” in Proceedings of the IEEE conference on computer vision and pattern recognition Workshops (CVPRW), 2019, pp. 0–0.
  18. S. Li, C. Zou, Y. Li, X. Zhao, and Y. Gao, “Attention-based multi-modal fusion network for semantic scene completion,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 07, 2020, pp. 11 402–11 409.
  19. S.-C. Wu, K. Tateno, N. Navab, and F. Tombari, “Scfusion: Real-time incremental scene reconstruction with semantic completion,” in 2020 International Conference on 3D Vision (3DV).   IEEE, 2020, pp. 801–810.
  20. A.-Q. Cao and R. de Charette, “Monoscene: Monocular 3d semantic scene completion,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2022, pp. 3991–4001.
  21. R. Miao, W. Liu, M. Chen, Z. Gong, W. Xu, C. Hu, and S. Zhou, “Occdepth: A depth-aware method for 3d semantic scene completion,” arXiv preprint arXiv:2302.13540, 2023.
  22. J. Engel, T. Schöps, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conference on Computer Vision (ECCV).   Springer, 2014, pp. 834–849.
  23. K. Tateno, F. Tombari, I. Laina, and N. Navab, “Cnn-slam: Real-time dense monocular slam with learned depth prediction,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 6243–6252.
  24. Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information processing systems (NeurIPS), 2021.
  25. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 12 786–12 796.
  26. E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), vol. 34, 2021, pp. 16 558–16 569.
  27. J. Ross, O. Mendez, A. Saha, M. Johnson, and R. Bowden, “Bev-slam: Building a globally-consistent world map using monocular vision,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 3830–3836.
  28. P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611.   Spie, 1992, pp. 586–606.
  29. J. Zhang, M. Kaess, and S. Singh, “On degeneracy of optimization-based state estimation problems,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 809–816.
  30. K. E. Iverson, “A programming language,” in Proceedings of the May 1-3, 1962, spring joint computer conference, 1962, pp. 345–351.
  31. M. Grupp, “evo: Python package for the evaluation of odometry and slam.” https://github.com/MichaelGrupp/evo, 2017.
Citations (4)

Summary

We haven't generated a summary for this paper yet.