Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM (2306.06850v1)

Published 12 Jun 2023 in cs.RO and cs.CV

Abstract: This paper presents Volume-DROID, a novel approach for Simultaneous Localization and Mapping (SLAM) that integrates Volumetric Mapping and Differentiable Recurrent Optimization-Inspired Design (DROID). Volume-DROID takes camera images (monocular or stereo) or frames from a video as input and combines DROID-SLAM, point cloud registration, an off-the-shelf semantic segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI) to generate a 3D semantic map of the environment and provide accurate localization for the robot. The key innovation of our method is the real-time fusion of DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI), achieved through the introduction of point cloud generation from RGB-Depth frames and optimized camera poses. This integration, engineered to enable efficient and timely processing, minimizes lag and ensures effective performance of the system. Our approach facilitates functional real-time online semantic mapping with just camera images or stereo video input. Our paper offers an open-source Python implementation of the algorithm, available at https://github.com/peterstratton/Volume-DROID.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. R. Sim, P. Elinas, M. Griffin, J. J. Little et al., “Vision-based slam using the rao-blackwellised particle filter,” in IJCAI Workshop on Reasoning with Uncertainty in Robotics, vol. 14, no. 1.   Citeseer, 2005, pp. 9–16.
  2. G. Grisetti, G. D. Tipaldi, C. Stachniss, W. Burgard, and D. Nardi, “Fast and accurate slam with rao–blackwellized particle filters,” Robotics and Autonomous Systems, vol. 55, no. 1, pp. 30–38, 2007, simultaneous Localisation and Map Building. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S092188900600145X
  3. S. Huang and G. Dissanayake, “Convergence and consistency analysis for extended kalman filter based slam,” IEEE Transactions on robotics, vol. 23, no. 5, pp. 1036–1049, 2007.
  4. G. Grisetti, R. Kümmerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based slam,” IEEE Intelligent Transportation Systems Magazine, vol. 2, no. 4, pp. 31–43, winter 2010.
  5. S. Thrun and M. Montemerlo, “The graph slam algorithm with applications to large-scale mapping of urban structures,” The International Journal of Robotics Research, vol. 25, no. 5-6, pp. 403–429, 2006.
  6. Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” 2021. [Online]. Available: https://arxiv.org/abs/2108.10869
  7. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” 2022.
  8. J. Wilson, Y. Fu, A. Zhang, J. Song, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Convolutional bayesian kernel inference for 3d semantic mapping,” arXiv preprint arXiv:2209.10663, 2022.
  9. A. Tourani, H. Bavle, J. L. Sanchez-Lopez, and H. Voos, “Visual slam: What are the current trends and what to expect?” Sensors, vol. 22, no. 23, 2022.
  10. K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder–decoder approaches,” in Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation.   Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 103–111. [Online]. Available: https://aclanthology.org/W14-4012
  11. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 2016.
  12. W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer, “Tartanair: A dataset to push the limits of visual slam,” 2020.
  13. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” 2019.
  14. Z. Ravichandran, L. Peng, N. Hughes, J. D. Griffith, and L. Carlone, “Hierarchical representations and explicit memory: Learning effective navigation policies on 3d scene graphs using graph neural networks,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 9272–9279.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com