3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization (2403.11367v1)

Published 17 Mar 2024 in cs.CV, cs.GR, and cs.RO

Abstract: This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.

References (19)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces 3DGS-ReLoc, a novel approach that leverages 3D Gaussian Splatting for accurate map representation and effective visual relocalization.
It employs a unique loss function blending photometric and reprojection errors and utilizes 2D voxel maps with KD-trees for efficient spatial querying.
Extensive evaluations on the KITTI360 dataset demonstrate high precision in initial pose estimation and refinement through NCC and PnP techniques.

3D Gaussian Splatting for Enhanced Map Representation and Visual ReLocalization in Autonomous Navigation

Introduction

The advance of 3D mapping and visual relocalization techniques is pivotal in autonomous driving and robotic navigation. The integration of LiDAR and camera data presents a potent solution by combining the depth accuracy of LiDAR with the rich visual information of cameras. However, creating a unified representation that is both detailed and geometrically accurate poses significant challenges. This paper introduces 3DGS-ReLoc, leveraging 3D Gaussian Splatting (3DGS) for map representation and visual relocalization. Through extensive evaluation on the KITTI360 dataset, the method demonstrates notable effectiveness, versatility, and precision.

Revisiting 3D Gaussian Splatting

3D Gaussian Splatting is a method designed for the real-time rendering of photorealistic scenes using a set of 3D Gaussians. It excellently supports the demands of large-scale, complex environments by employing LiDAR data for the initial training, thereby significantly enhancing geometric accuracy and detail in the representation. Additionally, to curb the high GPU memory consumption, a novel approach utilizing a combination of 2D voxel maps and KD-trees for efficient spatial querying is employed, marking a significant leap in managing and training large-scale 3DGS maps.

3DGS Map Representation: A Detailed Examination

Initiation and world-space representation methods are critical components of the 3DGS framework. This approach cleverly avoids traditional challenges by initial map generation using LiDAR point clouds, improving upon earlier methods that may compromise on environmental models' precision. For map storage and management efficiency, RGB color information is exclusively used, neglecting Spectral Harmonics (SH) to encode lighting, which significantly saves memory.

Loss Function Adaptation

A novel loss function is introduced, blending photometric and reprojection error losses, aiming to conserve both scene's geometric fidelity and the depth accuracy during renderings — a critical aspect fostering detailed environmental models crucial for high-level perceptions in autonomous vehicles.

Visual ReLocalization Technique

For visual relocalization, the approach begins with leveraging raw pose data to approximately localize the camera within the global map. A brute-force search is then conducted to refine this approximation, utilizing normalized cross-correlation (NCC) to efficiently identify similar scenes. Subsequent iterations of feature-based matching and the Perspective-n-Point (PnP) technique are applied to refine the camera pose, showcasing a robust methodology for precision localization.

Experimental Evidence

The deployment over KITTI360's dataset validates the strategy's robustness, reflected through the accuracy in initial relocalization and the refinement stages. Notably, the live relocalization performances through Absolute Pose Error (APE) and Relative Pose Error (RPE), underline the method's effectiveness in managing dynamic and complex urban settings.

Limitations and Future Prospects

While the method presents a rigorous framework for map representation and visual relocalization, its performance is contingent upon the exclusion of lighting and shadow effects from the map. This may limit rendering fidelity under diverse outdoor lighting conditions. Future works may explore balancing between visual quality and memory efficiency further and potentially developing a fully differentiable localization pipeline.

Conclusion

Introducing 3DGS-ReLoc marks a significant stride in enhancing the precision and efficiency of scene representation and localization in autonomous navigation. By adeptly fusing LiDAR and camera data, the method not only meets but surpasses existing challenges in creating accurate environmental maps. The research sets a foundation for more sophisticated navigation and perception systems, potentially revolutionizing how autonomous vehicles interpret and navigate through complex environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1769941823852876045

https://twitter.com/knishimae0531/status/1770363914439774548