Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding (2407.20853v1)

Published 30 Jul 2024 in cs.CV

Abstract: In recent years, the paradigm of neural implicit representations has gained substantial attention in the field of Simultaneous Localization and Mapping (SLAM). However, a notable gap exists in the existing approaches when it comes to scene understanding. In this paper, we introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system, that leverages a pre-trained 2D segmentation network to learn consistent semantic representations. Specifically, for high-fidelity surface reconstruction and spatial consistent scene understanding, we combine high-frequency multi-resolution tetrahedron-based features and low-frequency positional encoding as the implicit scene representations. Besides, to address the inconsistency of 2D segmentation results from multiple views, we propose a fusion strategy that integrates the semantic probabilities from previous non-keyframes into keyframes to achieve consistent semantic learning. Furthermore, we implement a confidence-based pixel sampling and progressive optimization weight function for robust camera tracking. Extensive experimental results on various datasets show the better or more competitive performance of our system when compared to other existing neural dense implicit RGB-D SLAM approaches. Finally, we also show that our approach can be used in augmented reality applications. Project page: \href{https://zju3dv.github.io/nis_slam}{https://zju3dv.github.io/nis\_slam}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Least-squares fitting of two 3-d point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9(5):698–700, 1987.
  2. Neural RGB-D surface reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6290–6301, 2022.
  3. CodeSLAM - learning a compact, optimisable representation for dense visual SLAM. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560–2568, 2018.
  4. Baking in the feature: Accelerating volumetric segmentation by rendering feature maps. CoRR, abs/2209.12744, 2022.
  5. ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM. IEEE Trans. Robotics, 37(6):1874–1890, 2021.
  6. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
  7. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction. 2024.
  8. Local-to-global registration for bundle-adjusting neural radiance fields. In EEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8264–8273, 2023.
  9. Masked-attention mask transformer for universal image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  10. Orbeez-SLAM: A real-time monocular visual SLAM with ORB features and nerf-realized mapping. CoRR, abs/2209.13274, 2022.
  11. B. Curless and M. Levoy. A volumetric method for building complex models from range images. In Proceedings of annual conference on Computer graphics and interactive techniques, pp. 303–312, 1996.
  12. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2432–2443, 2017.
  13. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph., 36(3):24:1–24:18, 2017.
  14. Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In International Conference on 3D Vision, pp. 1–11, 2022.
  15. Dynamic view synthesis from dynamic monocular video. In IEEE/CVF International Conference on Computer Vision, pp. 5712–5721, 2021.
  16. Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery. IEEE Robotics and Automation Letters, 4(3):3037–3044, 2019.
  17. Neural implicit dense semantic slam. arXiv preprint arXiv:2304.14560, 2023.
  18. Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments. In IEEE International Conference on Robotics and Automation, pp. 12199–12206, 2021.
  19. Closed-form solution of absolute orientation using orthonormal matrices. JOSA A, 5(7):1127–1135, 1988.
  20. Di-fusion: Online implicit 3d reconstruction with deep priors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8932–8941, 2021.
  21. Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
  22. H2-Mapping: Real-time dense mapping using hierarchical hybrid representation. arXiv preprint arXiv:2306.03207, 2023.
  23. ESLAM: Efficient dense slam system based on hybrid representation of signed distance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  24. Ray tracing volume densities. In SIGGRAPH, pp. 165–174, 1984.
  25. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  26. vmap: Vectorised object mapping for neural field slam. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 952–961, 2023.
  27. J. Kulhánek and T. Sattler. Tetra-nerf: Representing neural radiance fields using tetrahedra. In IEEE/CVF International Conference on Computer Vision, pp. 18412–18423, 2023.
  28. Panoptic neural fields: A semantic object-aware neural scene representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12871–12881, 2022.
  29. Bdloc: Global localization from 2.5d building map. In IEEE International Symposium on Mixed and Augmented Reality, pp. 80–89, 2021.
  30. Vox-surf: Voxel-based implicit surface representation. IEEE Trans. Vis. Comput. Graph., 30(3):1743–1755, 2024.
  31. Imtooth: Neural implicit tooth for dental augmented reality. IEEE Trans. Vis. Comput. Graph., 29(5):2837–2846, 2023.
  32. Neural sparse voxel fields. In Annual Conference on Neural Information Processing Systems, 2020.
  33. Marching cubes: A high resolution 3d surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, pp. 347–353. 1998.
  34. Habitat: A Platform for Embodied AI Research. In IEEE/CVF International Conference on Computer Vision, 2019.
  35. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219, 2021.
  36. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In IEEE International Conference on Robotics and Automation, pp. 4628–4635. IEEE, 2017.
  37. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, 2020.
  38. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  39. Extracting triangular 3d models, materials, and lighting from images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8280–8290, 2022.
  40. R. Mur-Artal and J. D. Tardós. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robotics, 33(5):1255–1262, 2017.
  41. Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4205–4212, 2019.
  42. KinectFusion: real-time dense surface mapping and tracking. In IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136, 2011.
  43. Dtam: Dense tracking and mapping in real-time. In IEEE/CVF International Conference on Computer Vision, pp. 2320–2327, 2011.
  44. UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In IEEE/CVF International Conference on Computer Vision, pp. 5569–5579, 2021.
  45. Nerfies: Deformable neural radiance fields. In IEEE/CVF International Conference on Computer Vision, pp. 5865–5874, 2021.
  46. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021.
  47. D-nerf: Neural radiance fields for dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327, 2021.
  48. NeRF-SLAM: Real-time dense monocular SLAM with neural radiance fields. CoRR, abs/2210.13641, 2022.
  49. R. A. Rosu and S. Behnke. Permutosdf: Fast multi-view reconstruction with implicit surfaces using permutohedral lattices. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  50. Point-slam: Dense neural point cloud-based slam. In IEEE/CVF International Conference on Computer Vision, 2023.
  51. BAD SLAM: bundle adjusted direct RGB-D SLAM. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 134–144, 2019.
  52. Rgbd-inertial trajectory estimation and mapping for ground robots. Sensors, 19(10):2251, 2019.
  53. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In Advances in Neural Information Processing Systems, pp. 6087–6101, 2021.
  54. Panoptic lifting for 3d scene understanding with neural fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9043–9052, 2023.
  55. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  56. A benchmark for the evaluation of RGB-D SLAM systems. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580, 2012.
  57. A benchmark for the evaluation of rgb-d slam systems. In IEEE/RSJ international conference on intelligent robots and systems, pp. 573–580, 2012.
  58. iMAP: Implicit mapping and positioning in real-time. In IEEE/CVF International Conference on Computer Vision, pp. 6209–6218. IEEE, 2021.
  59. NeuralRecon: Real-time coherent 3D reconstruction from monocular video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  60. Block-nerf: Scalable large scene neural view synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8238–8248, 2022.
  61. Cnn-slam: Real-time dense monocular slam with learned depth prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6243–6252, 2017.
  62. Z. Teed and J. Deng. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras. In Annual Conference on Neural Information Processing Systems, pp. 16558–16569, 2021.
  63. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly- throughs. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12912–12921, 2022.
  64. Co-SLAM: joint coordinate and sparse parametric encodings for neural real-time slam. In IEEE/CVF international conference on Computer Vision and Pattern Recognition, 2023.
  65. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Annual Conference on Neural Information Processing Systems 2021, pp. 27171–27183, 2021.
  66. Neuralfusion: Online depth fusion in latent space. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3162–3172, 2021.
  67. Learning flow-based feature warping for face frontalization with illumination inconsistent supervision. In Proceedings of the European Conference on Computer Vision, 2020.
  68. Kintinuous: Spatially extended kinectfusion. In Proceedings of RSS ’12 Workshop on RGB-D: Advanced Reasoning with Depth Cameras, 2012.
  69. ElasticFusion: Real-time dense SLAM and light source estimation. In International Journal of Robotics Research, vol. 35, pp. 1697–1716, 2016.
  70. Point-nerf: Point-based neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5438–5448, 2022.
  71. Vox-Fusion: Dense tracking and mapping with voxel-based neural implicit representation. In IEEE International Symposium on Mixed and Augmented Reality, pp. 499–507, 2022.
  72. Vox-fusion++: Voxel-based neural implicit dense tracking and mapping with multi-maps. arXiv preprint arXiv:2403.12536, 2024.
  73. Go-slam: Global optimization for consistent 3d instant reconstruction. In IEEE/CVF International Conference on Computer Vision, pp. 3727–3737, 2023.
  74. In-place scene labelling and understanding with implicit scene representation. In IEEE/CVF International Conference on Computer Vision, pp. 15838–15847, 2021.
  75. Unsupervised learning of depth and ego-motion from video. In IEEE/CVF conference on computer vision and pattern recognition, pp. 1851–1858, 2017.
  76. Sni-slam: Semantic neural implicit slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21167–21177, 2024.
  77. Latitude: Robotic global localization with truncated dynamic low-pass filter in city-scale nerf. In IEEE International Conference on Robotics and Automation, pp. 8326–8332, 2023.
  78. NICER-SLAM: neural implicit scene encoding for RGB SLAM. CoRR, abs/2302.03594, 2023.
  79. NICE-SLAM: neural implicit scalable encoding for SLAM. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 12786–12796, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.