Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM (2401.01545v2)

Published 3 Jan 2024 in cs.CV and cs.RO

Abstract: SLAM systems based on NeRF have demonstrated superior performance in rendering quality and scene reconstruction for static environments compared to traditional dense SLAM. However, they encounter tracking drift and mapping errors in real-world scenarios with dynamic interferences. To address these issues, we introduce DDN-SLAM, the first real-time dense dynamic neural implicit SLAM system integrating semantic features. To address dynamic tracking interferences, we propose a feature point segmentation method that combines semantic features with a mixed Gaussian distribution model. To avoid incorrect background removal, we propose a mapping strategy based on sparse point cloud sampling and background restoration. We propose a dynamic semantic loss to eliminate dynamic occlusions. Experimental results demonstrate that DDN-SLAM is capable of robustly tracking and producing high-quality reconstructions in dynamic environments, while appropriately preserving potential dynamic objects. Compared to existing neural implicit SLAM systems, the tracking results on dynamic datasets indicate an average 90% improvement in Average Trajectory Error (ATE) accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. T. Qin, P. Li, S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, 34(4): 1004-1020, 2018.
  2. R. Wang, M. Schworer, D. Cremers, “Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3903-3911.
  3. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
  4. T. Wu, F. Zhong, A. Tagliasacchi, F. Cole, and C. Oztireli, “D22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTNeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video,” arXiv preprint arXiv:2205.15838, 2022.
  5. P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689, 2021.
  6. C.-Y. Weng, B. Curless, P. P. Srinivasan, J. T. Barron, and I. Kemelmacher-Shlizerman, “Humannerf: Free-viewpoint rendering of moving people from monocular video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16210–16220.
  7. H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12922–12931.
  8. NeRF-SR: High Quality Neural Radiance Fields using Supersampling. In Proceedings of the 30th ACM International Conference on Multimedia, pages=6445–6454, 2022.
  9. Dex-nerf: Using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217, 2021.
  10. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. arXiv preprint arXiv:2206.00665, 2022.
  11. NeRF-RPN: A general framework for object detection in NeRFs. arXiv preprint arXiv:2211.11646, 2022.
  12. Bad slam: Bundle adjusted direct RGB-D slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages=134–144, 2019.
  13. J. C. V. Soares, M. Gattass, M. A. Meggiolaro, “Crowd-SLAM: visual SLAM towards crowded environments using object detection,” Journal of Intelligent & Robotic Systems, 2021, 102(2), 50.
  14. Y. Ling and S. Shen, “Building maps for autonomous navigation using sparse visual SLAM features,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1374–1381.
  15. M. R”unz and L. Agapito, “Co-fusion: Real-time segmentation, tracking and fusion of multiple objects,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 4471–4478.
  16. R. Craig and R. C. Beavis, “TANDEM: matching proteins with tandem mass spectra,” Bioinformatics, vol. 20, no. 9, pp. 1466–1467, 2004.
  17. Y. Yao, Z. Luo, S. Li, T. Shen, T. Fang, and L. Quan, “Recurrent MVSNet for high-resolution multi-view stereo depth inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5525–5534.
  18. J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5855–5864.
  19. B. Bescos, J. M. M. Montiel, and D. Scaramuzza, “DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4076-4083, 2018.
  20. Chen, J., Xu, Y., “DynamicVINS: Visual-Inertial Localization and Dynamic Object Tracking,” In Proceedings of the 2022 China Automation Congress (CAC), IEEE, 2022, pp. 6861-6866.
  21. Palazzolo, E., Behley, J., Lottes, P., Giguere, P., Stachniss, C., “ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals,” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nov. 2019, pp. 7855-7862.
Citations (8)

Summary

  • The paper introduces DDN-SLAM, a semantic SLAM system that integrates depth-guided static masks with optical flow and semantic information to improve tracking in dynamic environments.
  • It achieves real-time dense mapping and tracking at 20-30Hz using versatile input configurations like monocular, stereo, and RGB-D cameras.
  • Experimental results demonstrate superior accuracy, computational efficiency, and robustness compared to state-of-the-art methods, indicating strong potential for mobile platform deployment.

Overview of DDN-SLAM

DDN-SLAM is introduced as a semantic Simultaneous Localization and Mapping (SLAM) system that operates effectively in dynamic environments. Traditional neural implicit SLAM systems excel in static scenarios but struggle with dynamic interferences such as moving objects. DDN-SLAM incorporates semantic information to improve tracking and mapping quality in scenes affected by these dynamic elements.

Real-time Dense Mapping and Tracking

The system employs depth-guided static masks and multi-resolution hashing encoding to swiftly fill in missing data and produce high-quality maps without artifacts from dynamic objects. It uses features attested by optical flow and semantic information to perform robust tracking. Additionally, the system supports various input configurations, such as monocular, stereo, and RGB-D cameras, maintaining a reliable operation at a frequency of 20-30Hz.

Handling Dynamic Environments

Addressing challenges presented by dynamic objects, DDN-SLAM filters and segments static and dynamic features within the scene. It skips computing the fundamental matrix, a complex task in traditional Visual SLAM (VSLAM) methods, and instead identifies dynamic points through a combination of depth and optical flow anomalies. The ability to differentiate between foreground and background points with a high degree of accuracy allows the system to focus on static elements for initial pose estimation and mapping.

Experimental Validation

Extensive testing was conducted across multiple datasets, comprising both virtual and real-world scenarios, demonstrating superior tracking and mapping capabilities compared to existing state-of-the-art approaches. The system showed resilience across various dynamic and static scenes, excelling in conditions where traditional neural implicit SLAM systems and even some traditional SLAM methods falter.

Implementation and Efficiency

DDN-SLAM is implemented on high-performance hardware, and the experiments highlighted its computational efficiency relative to its contemporaries. It shows significantly improved memory usage and operational speed, suggesting suitability for deployment on lighter-weight platforms. Maintaining stable operation across a wealth of scenarios, DDN-SLAM presents itself as a strong candidate for various real-time applications that necessitate accurate and robust 3D scene reconstruction.

Limitations and Future Work

While the system's potential for mobile platforms like robots is clear, the paper suggests that better depth estimation and enhancements in maintaining spatio-temporal consistency could lead to further improvements. Future research might focus on optimizing the system for even better detail capture and optimization for global consistency in mapping results.

X Twitter Logo Streamline Icon: https://streamlinehq.com