LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting (2411.02703v1)

Published 5 Nov 2024 in cs.RO

Abstract: 3D Gaussian Splatting (3DGS) has shown its ability in rapid rendering and high-fidelity mapping. In this paper, we introduce LVI-GS, a tightly-coupled LiDAR-Visual-Inertial mapping framework with 3DGS, which leverages the complementary characteristics of LiDAR and image sensors to capture both geometric structures and visual details of 3D scenes. To this end, the 3D Gaussians are initialized from colourized LiDAR points and optimized using differentiable rendering. In order to achieve high-fidelity mapping, we introduce a pyramid-based training approach to effectively learn multi-level features and incorporate depth loss derived from LiDAR measurements to improve geometric feature perception. Through well-designed strategies for Gaussian-Map expansion, keyframe selection, thread management, and custom CUDA acceleration, our framework achieves real-time photo-realistic mapping. Numerical experiments are performed to evaluate the superior performance of our method compared to state-of-the-art 3D reconstruction systems.

Summary

The paper introduces a real-time, tightly-coupled LiDAR-Visual-Inertial SLAM system using 3D Gaussian Splatting for high-quality 3D reconstruction.
It employs a coarse-to-fine mapping approach with advanced keyframe management and depth loss integration to enhance scalability and reconstruction precision.
Experimental evaluations on FAST-LIVO and R3LIVE datasets demonstrate superior PSNR and SSIM performance compared to state-of-the-art SLAM methods.

Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting

The paper presents LVI-GS, a tightly-coupled LiDAR-Visual-Inertial Mapping framework utilizing 3D Gaussian Splatting (3DGS). This framework is designed to leverage the complementary attributes of LiDAR and image sensors for effectively capturing both geometric structures and visual details of 3D scenes. Traditional SLAM approaches face a notable trade-off between fidelity in representation and real-time performance, especially when handling extensive datasets and dynamic environments. LVI-GS attempts to address these issues by incorporating 3DGS, a semi-implicit mapping approach that facilitates the integration of high-fidelity scene reconstruction capabilities in real-time SLAM systems.

Methodological Contributions

There are several prominent contributions elucidated in the paper:

Real-time LVI-GS System: The paper introduces a sophisticated real-time system capable of producing accurate 3D representations using dynamic hyper primitives. The LVI-GS system utilizes 3DGS to achieve high-quality rendering, ensuring both efficiency and precision in representing complex environments.
Coarse-to-Fine Mapping: The framework employs a coarse-to-fine approach for map construction, utilizing RGB and depth image pyramids. This enables progressive refinement of the map at various levels of detail, thereby enhancing scalability and computational efficiency.
Advanced Keyframe Management: The authors implement a robust strategy for keyframe selection and processing. This includes incorporating depth loss into the system to improve 3D Gaussian map accuracy, resulting in more precise reconstructions.

Methodology

The framework operates through two parallel and collaborative threads: odometry handling and real-time optimization of 3D Gaussians. Together, these threads maintain a shared hyper primitives module, exchanging data that includes 3D point clouds, camera poses, and both camera images and depth information. The 3D Gaussian representation (3DGS) forms the core of the framework, comprising anisotropic Gaussian primitives with attributes such as opacity, center position, and covariance matrix. A pyramid-based training approach is employed to support multi-scale feature learning by leveraging color and depth image hierarchies, progressively optimizing the 3D Gaussian fields.

Experimental Evaluation

The framework's efficacy was validated through extensive experiments conducted on the FAST-LIVO and R3LIVE datasets, containing LiDAR-Visual-Inertial data. In comparison to state-of-the-art methods, such as NeRF-SLAM and Gaussian-LIC, LVI-GS demonstrated superior performance metrics in photorealistic mapping, including higher PSNR and SSIM values. The system maintained rendering quality even under dynamic conditions, thanks to the efficient integration of its mapping and optimization techniques.

Implications and Future Directions

The LVI-GS framework significantly contributes to the SLAM domain by addressing the balance between computational efficiency and high-fidelity scene reconstruction in various complex scenes. It provides valuable insights into integrating LiDAR and visual data using 3DGS, which could facilitate advancements in real-time robotic applications and AR/VR environments. Future investigations could focus on including additional sensor modalities and further optimizing the framework for broader applicability, potentially extending its use across different domains.

The paper proposes a compelling methodology for SLAM by incorporating 3DGS, offering a promising avenue for achieving high-fidelity and real-time 3D mapping performance. The compelling results, demonstrated quantitatively and qualitatively, underscore the potential for continued refinement and adaptation of these techniques in evolving technological landscapes.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1854065036370543090

https://twitter.com/zhenjun_zhao/status/1854008127127175470

https://twitter.com/OWW/status/1854099338005237888