Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM (2304.14377v1)

Published 27 Apr 2023 in cs.CV

Abstract: We present Co-SLAM, a neural RGB-D SLAM system based on a hybrid representation, that performs robust camera tracking and high-fidelity surface reconstruction in real time. Co-SLAM represents the scene as a multi-resolution hash-grid to exploit its high convergence speed and ability to represent high-frequency local features. In addition, Co-SLAM incorporates one-blob encoding, to encourage surface coherence and completion in unobserved areas. This joint parametric-coordinate encoding enables real-time and robust performance by bringing the best of both worlds: fast convergence and surface hole filling. Moreover, our ray sampling strategy allows Co-SLAM to perform global bundle adjustment over all keyframes instead of requiring keyframe selection to maintain a small number of active keyframes as competing neural SLAM approaches do. Experimental results show that Co-SLAM runs at 10-17Hz and achieves state-of-the-art scene reconstruction results, and competitive tracking performance in various datasets and benchmarks (ScanNet, TUM, Replica, Synthetic RGBD). Project page: https://hengyiwang.github.io/projects/CoSLAM

Citations (138)

View on Semantic Scholar

Summary

The paper introduces a novel SLAM method that combines coordinate-based and sparse parametric encodings to boost real-time tracking and 3D reconstruction.
It utilizes a multi-resolution hash-grid and one-blob encoding to ensure rapid convergence, surface coherence, and efficient hole-filling.
Experimental results show superior reconstruction metrics and robust camera tracking at 10-17Hz across various RGB-D datasets.

Overview of "Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM"

The paper presents "Co-SLAM," a novel approach to real-time simultaneous localization and mapping (SLAM) using neural techniques. This method combines joint coordinate and sparse parametric encodings to enhance the accuracy and efficiency of RGB-D SLAM systems. The authors propose a hybrid representation that leverages the smoothness and coherence benefits of coordinate-based approaches alongside the speedy convergence and detail retention capabilities of sparse parametric encodings.

Methodology

Co-SLAM represents the scene using a multi-resolution hash-grid, which is known for its fast convergence and ability to capture high-frequency details. The key innovation lies in integrating one-blob encoding, promoting surface coherence and completion, particularly in unobserved areas. This dual encoding strategy enables real-time performance, robust tracking, and efficient hole-filling.

The system performs global bundle adjustment over all keyframes, an improvement over existing methods that require keyframe selection to maintain manageable active keyframes. This holistic optimization approach allows Co-SLAM to maintain accuracy while reducing computational overhead.

Experimental Results

The empirical evaluation of Co-SLAM shows it operates at 10-17Hz, delivering state-of-the-art scene reconstruction on various datasets such as ScanNet, TUM, Replica, and Synthetic RGB-D. The system's performance surpasses existing methods, providing competitive tracking performance with efficient memory use.

Quantitatively, Co-SLAM achieves superior results in several reconstruction metrics, notably with improved depth accuracy and completion ratios. The robust camera tracking is demonstrated through evaluation on both synthetic and real-world datasets, where Co-SLAM exhibits lower absolute trajectory error compared to counterpart systems.

Implications and Future Directions

The introduction of joint coordinate and parametric encoding signifies a pivotal advancement for neural SLAM systems. This method not only improves the fidelity of 3D reconstructions but also achieves real-time applicability, a critical aspect for practical deployments in robotics and augmented reality.

Future research may further explore optimizing this framework for monocular setups, enhancing resilience to dynamic environments, and integrating advanced loop closure techniques. The proposed depth-guided sampling strategy could evolve to incorporate more adaptive sampling based on scene complexity. Moreover, extending Co-SLAM's capabilities to operate efficiently on devices with limited computational resources remains an open avenue.

In summary, Co-SLAM's architectural innovations present a promising direction for realizing efficient and accurate neural SLAM systems, potentially catalyzing further research into hybrid encoding methodologies for real-time applications.

PDF Markdown