DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM (2401.01545v2)
Abstract: SLAM systems based on NeRF have demonstrated superior performance in rendering quality and scene reconstruction for static environments compared to traditional dense SLAM. However, they encounter tracking drift and mapping errors in real-world scenarios with dynamic interferences. To address these issues, we introduce DDN-SLAM, the first real-time dense dynamic neural implicit SLAM system integrating semantic features. To address dynamic tracking interferences, we propose a feature point segmentation method that combines semantic features with a mixed Gaussian distribution model. To avoid incorrect background removal, we propose a mapping strategy based on sparse point cloud sampling and background restoration. We propose a dynamic semantic loss to eliminate dynamic occlusions. Experimental results demonstrate that DDN-SLAM is capable of robustly tracking and producing high-quality reconstructions in dynamic environments, while appropriately preserving potential dynamic objects. Compared to existing neural implicit SLAM systems, the tracking results on dynamic datasets indicate an average 90% improvement in Average Trajectory Error (ATE) accuracy.
- T. Qin, P. Li, S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, 34(4): 1004-1020, 2018.
- R. Wang, M. Schworer, D. Cremers, “Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3903-3911.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
- T. Wu, F. Zhong, A. Tagliasacchi, F. Cole, and C. Oztireli, “D22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTNeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video,” arXiv preprint arXiv:2205.15838, 2022.
- P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689, 2021.
- C.-Y. Weng, B. Curless, P. P. Srinivasan, J. T. Barron, and I. Kemelmacher-Shlizerman, “Humannerf: Free-viewpoint rendering of moving people from monocular video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16210–16220.
- H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12922–12931.
- NeRF-SR: High Quality Neural Radiance Fields using Supersampling. In Proceedings of the 30th ACM International Conference on Multimedia, pages=6445–6454, 2022.
- Dex-nerf: Using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217, 2021.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. arXiv preprint arXiv:2206.00665, 2022.
- NeRF-RPN: A general framework for object detection in NeRFs. arXiv preprint arXiv:2211.11646, 2022.
- Bad slam: Bundle adjusted direct RGB-D slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages=134–144, 2019.
- J. C. V. Soares, M. Gattass, M. A. Meggiolaro, “Crowd-SLAM: visual SLAM towards crowded environments using object detection,” Journal of Intelligent & Robotic Systems, 2021, 102(2), 50.
- Y. Ling and S. Shen, “Building maps for autonomous navigation using sparse visual SLAM features,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1374–1381.
- M. R”unz and L. Agapito, “Co-fusion: Real-time segmentation, tracking and fusion of multiple objects,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 4471–4478.
- R. Craig and R. C. Beavis, “TANDEM: matching proteins with tandem mass spectra,” Bioinformatics, vol. 20, no. 9, pp. 1466–1467, 2004.
- Y. Yao, Z. Luo, S. Li, T. Shen, T. Fang, and L. Quan, “Recurrent MVSNet for high-resolution multi-view stereo depth inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5525–5534.
- J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5855–5864.
- B. Bescos, J. M. M. Montiel, and D. Scaramuzza, “DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4076-4083, 2018.
- Chen, J., Xu, Y., “DynamicVINS: Visual-Inertial Localization and Dynamic Object Tracking,” In Proceedings of the 2022 China Automation Congress (CAC), IEEE, 2022, pp. 6861-6866.
- Palazzolo, E., Behley, J., Lottes, P., Giguere, P., Stachniss, C., “ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals,” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nov. 2019, pp. 7855-7862.