Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection (2403.06877v1)

Published 11 Mar 2024 in cs.RO and cs.CV

Abstract: We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. We exploit the trajectory from a real-time lidar SLAM system to bootstrap a Structure-from-Motion (SfM) procedure to both significantly reduce the computation time and to provide metric scale which is crucial for lidar depth loss. We use submapping to scale the system to large-scale environments captured over long trajectories. We demonstrate the reconstruction system with data from a multi-camera, lidar sensor suite onboard a legged robot, hand-held while scanning building scenes for 600 metres, and onboard an aerial robot surveying a multi-storey mock disaster site-building. Website: https://ori-drs.github.io/projects/silvr/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2016.
  2. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  3. M. Tancik, E. Weber, E. Ng, R. Li, B. Yi, J. Kerr, T. Wang, A. Kristoffersen, J. Austin, K. Salahi, A. Ahuja, D. McAllister, and A. Kanazawa, “Nerfstudio: A modular framework for neural radiance field development,” in SIGGRAPH, 2023.
  4. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (TOG), vol. 41, no. 4, pp. 102:1–102:15, Jul. 2022.
  5. K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2022, pp. 12 882–12 891.
  6. Z. Yu, S. Peng, M. Niemeyer, T. Sattler, and A. Geiger, “MonoSDF: Exploring monocular geometric cues for neural implicit surface reconstruction,” Conf. on Neural Information Processing Systems (NeurIPS), 2022.
  7. D. Wisth, M. Camurri, and M. Fallon, “VILENS: Visual, inertial, lidar, and leg odometry for all-terrain legged robots,” IEEE Trans. Robotics, vol. 39, no. 1, pp. 309–326, 2023.
  8. J. Behley and C. Stachniss, “Efficient surfel-based slam using 3d laser range data in urban environments,” in Robotics: Science and Systems (RSS), 2018.
  9. J. Lin and F. Zhang, “R3LIVE: A robust, real-time, rgb-colored, lidar-inertial-visual tightly-coupled state estimation and mapping package,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022, pp. 10 672–10 678.
  10. J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real-time.” in Robotics: Science and Systems (RSS), vol. 2, no. 9.   Berkeley, CA, 2014, pp. 1–9.
  11. S. Zhao, H. Zhang, P. Wang, L. Nogueira, and S. Scherer, “Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 8729–8736.
  12. W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” IEEE Trans. Robotics, vol. 38, no. 4, pp. 2053–2073, 2022.
  13. Y. Tao, M. Popović, Y. Wang, S. T. Digumarti, N. Chebrolu, and M. Fallon, “3d lidar reconstruction with probabilistic depth completion for robotic navigation,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 5339–5346.
  14. Y. Furukawa and J. Ponce, “Accurate, dense, and robust multiview stereopsis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1362–1376, 2010.
  15. K. Rematas, A. Liu, P. P. Srinivasan, J. T. Barron, A. Tagliasacchi, T. Funkhouser, and V. Ferrari, “Urban radiance fields,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2022, pp. 12 932–12 942.
  16. M. Bosse, P. Newman, J. Leonard, M. Soika, W. Feiten, and S. Teller, “An atlas framework for scalable mapping,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), vol. 2, 2003, pp. 1899–1906 vol.2.
  17. B.-J. Ho, P. Sodhi, P. Teixeira, M. Hsiao, T. Kusnur, and M. Kaess, “Virtual occupancy grid map for submap-based pose graph slam and planning in 3d environments,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2018, pp. 2175–2182.
  18. V. Reijgwart, A. Millane, H. Oleynikova, R. Siegwart, C. Cadena, and J. Nieto, “Voxgraph: Globally consistent, volumetric mapping using signed distance function submaps,” IEEE Robotics and Automation Letters, 2020.
  19. Y. Wang, M. Ramezani, M. Mattamala, S. T. Digumarti, and M. Fallon, “Strategies for large scale elastic and semantic lidar reconstruction,” Journal of Robotics and Autonomous Systems, vol. 155, p. 104185, 2022.
  20. M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2022, pp. 8248–8258.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Conf. on Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
  22. S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5501–5510.
  23. B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, July 2023.
  24. A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “PlenOctrees for real-time rendering of neural radiance fields,” in Intl. Conf. on Computer Vision (ICCV), 2021.
  25. J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-NeRF 360: Unbounded anti-aliased neural radiance fields,” IEEE Int. Conf. Computer Vision and Pattern Recognition, 2022.
  26. R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth, “NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2021.
  27. Z. Li, T. Müller, A. Evans, R. H. Taylor, M. Unberath, M.-Y. Liu, and C.-H. Lin, “Neuralangelo: High-fidelity neural surface reconstruction,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2023.
  28. L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” Conf. on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 4805–4815, 2021.
  29. P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” Conf. on Neural Information Processing Systems (NeurIPS), 2021.
  30. X. Zhong, Y. Pan, J. Behley, and C. Stachniss, “Shine-mapping: Large-scale 3d mapping using sparse hierarchical implicit neural representations,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023.
  31. J. Deng, Q. Wu, X. Chen, S. Xia, Z. Sun, G. Liu, W. Yu, and L. Pei, “Nerf-loam: Neural implicit representation for large-scale incremental lidar odometry and mapping,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  32. S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh, “Neural volumes: learning dynamic renderable volumes from images,” ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–14, 2019.
  33. V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhofer, “Deepvoxels: Learning persistent 3d feature embeddings,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2019, pp. 2437–2446.
  34. V. Sitzmann, M. Zollhöfer, and G. Wetzstein, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Conf. on Neural Information Processing Systems (NeurIPS), vol. 32, 2019.
  35. N. Max, “Optical models for direct volume rendering,” IEEE Trans. on Visualization and Computer Graphics, vol. 1, no. 2, pp. 99–108, 1995.
  36. J. T. Kajiya and B. P. Von Herzen, “Ray tracing volume densities,” SIGGRAPH, vol. 18, no. 3, pp. 165–174, 1984.
  37. Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,” https://github.com/facebookresearch/detectron2, 2019.
  38. D. Azinović, R. Martin-Brualla, D. B. Goldman, M. Nießner, and J. Thies, “Neural rgb-d surface reconstruction,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, 2022, pp. 6290–6301.
  39. U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, pp. 395–416, 2007.
  40. P. Furgale, J. Rehder, and R. Siegwart, “Unified temporal and spatial calibration for multi-sensor systems,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).   IEEE, 2013, pp. 1280–1286.
  41. L. F. T. Fu, N. Chebrolu, and M. Fallon, “Extrinsic calibration of camera to lidar using a differentiable checkerboard model,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
  42. H. Aanæs, R. R. Jensen, G. Vogiatzis, E. Tola, and A. B. Dahl, “Large-scale data for multiple-view stereopsis,” International Journal of Computer Vision, pp. 1–16, 2016.
  43. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
Citations (7)

Summary

  • The paper advances 3D reconstruction by integrating lidar with NeRF to enhance geometric fidelity and capture detailed textures.
  • It employs dynamic lidar SLAM and trajectory bootstrapping to significantly reduce computation time while ensuring global scale accuracy.
  • The submapping strategy enables scalable mapping over large areas, validated through diverse real-world robotic inspection tests.

Overview of SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

The paper presents SiLVR, an advanced reconstruction system designed to integrate lidar and visual data utilizing Neural Radiance Fields (NeRFs) for generating high-quality, scalable 3D reconstructions. This system addresses traditional challenges in robotic inspections, where dense 3D reconstructions are vital for tasks such as industrial inspection and autonomous navigation.

Key Contributions

SiLVR innovatively adapts the state-of-the-art NeRF representations to include lidar data, which enhances geometric constraints and improves the accuracy of depth and surface normal estimations. The integration of lidar data allows the system to maintain robust performance in texture-less areas—where typical vision-based methods might falter.

Major Contributions:

  1. Integrated Lidar and Visual Based 3D Reconstructions: SiLVR combines multi-camera visual data with lidar measurements to construct photorealistic 3D models that match the geometric fidelity provided by lidar while benefiting from texture detail by cameras.
  2. Geometric Constraints from Lidar: The system includes both depth and surface normal regularization from lidar to strengthen the geometry reconstruction process. These additions help mitigate the challenges NeRF faces in areas of limited texture and inadequate multi-view input.
  3. Efficient Trajectory Bootstrapping for Enhanced Mapping: By utilizing a real-time lidar SLAM system to drive a Structure-from-Motion (SfM) process, the computation time is significantly reduced while still ensuring the global metric scale necessary for practical applications. This approach allows dynamic pose estimation, aligning lidar and visual inputs effectively.
  4. Submapping for Scalability: SiLVR employs a submapping strategy to handle large-scale environments. This technique partitions the scene into local submaps, enabling the system to maintain high performance over a 600-meter trajectory without sacrificing accuracy or precision.
  5. Real-world Location Tests on Diverse Platforms: The robustness of the system is demonstrated across various environments, from static scanning with handheld devices to more complex scenes using legged and aerial robots. This diversity of testing showcases SiLVR's versatility and effectiveness across different robotic configurations and mission profiles.

Implications and Future Work

The integration of NeRF with lidar data in SiLVR represents a significant advancement in large-scale 3D reconstructions, particularly for robotics applications that require accurate and detailed environmental mapping. This methodology lays groundwork for future developments in combining different sensor modalities to overcome the limitations seen when deploying individual sensors.

Future Directions:

  • Further research could focus on refining the computational efficiency of integrating multiple sensor inputs to support real-time application scenarios.
  • Development of adaptive approaches for handling varying conditions of light and texture to enhance the performance of inspection robots under dynamic environmental conditions.
  • Exploration of new algorithms that can dynamically balance trade-offs between processing power, scalability, and reconstruction fidelity.

SiLVR is an insightful contribution to robotic inspection technology. It addresses many challenges that have hindered progress in the field, posing a significant step forward in creating robust, scalable systems for automated environments. The adoption of a neural-field-based approach to incorporation of lidar data into high-quality visual reconstructions points toward a promising horizon for efficiently mapping complex scenes in robotics.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com