GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping (2501.08672v1)

Published 15 Jan 2025 in cs.RO and cs.CV

Abstract: In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel scene representation approach. However, existing vision-only 3D-GS methods often rely on hand-crafted heuristics for point-cloud densification and face challenges in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping by leveraging complementary sensing characteristics: rich texture information from cameras, precise geometric measurements from LiDAR, and high-frequency motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system. Our map system comprises a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based odometry. The global Gaussian map consists of hash-indexed voxels organized in a recursive octree, effectively covering sparse spatial volumes while adapting to different levels of detail and scales. The Gaussian map is initialized through multi-sensor fusion and optimized with photometric gradients. Our system incrementally maintains a sliding window of Gaussians, significantly reducing GPU computation and memory consumption by only optimizing the map within the sliding window. Moreover, we implement a tightly coupled multi-sensor fusion odometry with an iterative error state Kalman filter (IESKF), leveraging real-time updating and rendering of the Gaussian map. Our system represents the first real-time Gaussian-based SLAM framework deployable on resource-constrained embedded systems, demonstrated on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities. All implementation algorithms, hardware designs, and CAD models will be publicly available.

Summary

The paper introduces GS-LIVO, a real-time multi-sensor fusion system combining LiDAR, inertial, and visual data for 3D odometry and mapping using an efficient Gaussian representation.
Key innovations include a global Gaussian map with a hash-indexed octree and a Gaussian sliding window mechanism, improving memory and computational efficiency.
Evaluations show GS-LIVO achieves competitive odometry accuracy and real-time performance on resource-constrained embedded platforms, enabling advanced robotics applications.

Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with 3D Gaussian Splatting

The paper "GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping" introduces an advanced SLAM (Simultaneous Localization and Mapping) system that amalgamates LiDAR, inertial, and visual data streams to enhance both localization precision and mapping fidelity. This work leverages the synergy of complementary sensors: LiDAR for accurate geometric measurements, cameras for rich texture data, and IMUs for measuring high-frequency motion, offering a significant contribution to real-time 3D mapping and localization in embedded systems.

Key Contributions

Global Gaussian Mapping Using Hash-Indexed Octree: The innovation in this paper includes a global Gaussian-based map using a spatial hash-indexed octree structure. This method efficiently manages sparse spatial volumes by adjusting the levels of detail hierarchically. This representation is optimized for computational and memory efficiency and enables the map to adapt to varying environmental complexities.
Data Fusion and Optimization: The system introduces a novel method for rapidly initializing and incrementally optimizing Gaussian maps using multi-sensor inputs and photometric gradients. This technique significantly reduces the computational overhead typically associated with high-fidelity map representations.
Gaussian Sliding Window Mechanism: The system employs a sliding window approach for Gaussian management, which restricts optimization to a local subset of the map, drastically reducing graphics memory usage and computational demands without compromising on real-time map updates.
Real-Time Multisensor Fusion Odometry: The developed odometry framework integrates LiDAR and inertial data through an Iterative Error State Kalman Filter (IESKF), leveraging Gaussian maps for refined pose estimation. This approach produces competitive results in localization accuracy, showcasing its capability to function on resource-constrained embedded systems like the NVIDIA Jetson Orin NX.

Performance and Implications

Extensive benchmarking on both public and proprietary datasets demonstrates that GS-LIVO provides substantial improvements in real-time performance, memory consumption, and odometry accuracy. The system achieves a significant advancement over existing SLAM frameworks by maintaining real-time capabilities on less powerful hardware, which is pivotal for real-time robotics applications.

Numerical and Qualitative Results

Rendering Quality: The system presents high-fidelity scene reconstruction capabilities, with PSNR values indicating superior visual quality and accuracy in various environments compared to similar SLAM systems.
Odometry Accuracy: GS-LIVO maintains competitive localization precision, outperforming several state-of-the-art methodologies in both indoor and outdoor scenarios. This is quantitatively represented in RMSE metrics, which show significant improvements in estimation accuracy.
Real-Time Performance: The system operates efficiently on embedded platforms with limited computational resources, indicating robustness and adaptability for practical robotic applications.

Future Directions

This paper lays the groundwork for further exploration in scalable, real-time SLAM systems utilizing advanced scene representations like Gaussian splatting. Future research could delve into adaptive level-of-detail management to balance computational load dynamically based on scene complexity and platform resources. Additionally, further refinement in the integration of neural radiance fields could augment photorealistic rendering capabilities, addressing scenarios requiring high-quality scene synthesis and novel viewpoint predictions.

The GS-LIVO framework's deployment on compact embedded systems exemplifies a transformative approach in enhancing robotic perception and interaction with dynamic environments, underscoring its potential to revolutionize applications in autonomous navigation, augmented reality, and beyond.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1879749658983231934

https://twitter.com/janusch_patas/status/1879791060119162914