LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System (2506.10567v1)

Published 12 Jun 2025 in cs.CV

Abstract: Simultaneous Localization and Mapping (SLAM) has been crucial across various domains, including autonomous driving, mobile robotics, and mixed reality. Dense visual SLAM, leveraging RGB-D camera systems, offers advantages but faces challenges in achieving real-time performance, robustness, and scalability for large-scale scenes. Recent approaches utilizing neural implicit scene representations show promise but suffer from high computational costs and memory requirements. ESLAM introduced a plane-based tensor decomposition but still struggled with memory growth. Addressing these challenges, we propose a more efficient visual SLAM model, called LRSLAM, utilizing low-rank tensor decomposition methods. Our approach, leveraging the Six-axis and CP decompositions, achieves better convergence rates, memory efficiency, and reconstruction/localization quality than existing state-of-the-art approaches. Evaluation across diverse indoor RGB-D datasets demonstrates LRSLAM's superior performance in terms of parameter efficiency, processing time, and accuracy, retaining reconstruction and localization quality. Our code will be publicly available upon publication.

Summary

The paper introduces a low-rank tensor decomposition that cuts memory usage by up to 90.1% and accelerates processing by up to 73.2%.
It employs a hybrid method combining Six-axis and CP decompositions to efficiently capture both scene geometry and detailed appearance features.
The approach enhances dense visual SLAM performance, making it highly applicable to real-time robotics and mixed reality scenarios.

LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System

The paper "LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System" introduces a novel approach in the domain of visual simultaneous localization and mapping (SLAM), targeting efficient scene representation to enhance performance in terms of computational cost, memory efficiency, convergence rate, and localization/reconstruction accuracy. Recognizing the sustained relevance of SLAM across various fields like autonomous driving and mixed reality, the authors address significant challenges in dense visual SLAM, particularly concerning the real-time processing and scalability of large-scale scenes.

Background and Challenges

Dense visual SLAM systems employing RGB-D camera setups offer promising results due to simplified sensor configurations. However, the computational burdens associated with such systems, notably in high-dimensional neural implicit scene representations, hinder their practical applicability. Traditional methods such as neural radiance fields (NeRF) and voxel grid features, while accurate, are limited by exponential increases in memory consumption when depicting complex scenes with fine geometric details. ESLAM's plane-based tensor decomposition marked progress, yet this method still faced quadratic memory growth issues.

Proposed Method: LRSLAM

LRSLAM introduces a more efficient model by leveraging low-rank tensor decomposition to manage scene geometry and appearance representation. This method employs hybrid decomposition techniques—Six-axis and CP decompositions—to encode the scene with compact yet expressive representations:

Six-axis Decomposition: This novel method factorizes tri-plane representations into six axis-aligned feature tensors. It maintains O( $n$ ) space complexity, providing significant improvements over ESLAM's quadratic complexity and facilitating more scalable scene encoding.
Hybrid Composition with CP Decomposition: By combining the Six-axis decomposition for detailed appearance features with CP decomposition for geometric features, LRSLAM efficiently captures intricate scene details. The CP decomposition, selected for its speedy convergence due to reduced complexity, aids geometry optimization, indirectly benefiting the appearance optimization.

Results and Implications

The empirical evaluations on indoor RGB-D datasets like ScanNet, TUM RGB-D, and Replica reveal LRSLAM's superiority in localization and mapping tasks. It demonstrates substantial reductions in parameter usage (up to 90.1% fewer parameters than ESLAM) and processing time (up to 73.2% faster) while matching or exceeding reconstruction and localization accuracies.

Theoretical Implications:

These advancements suggest potential shifts towards more memory-efficient SLAM systems, which can operate effectively in real-time scenarios without sacrificing accuracy. By minimizing storage demands and improving convergence rates, LRSLAM provides a pathway for deploying SLAM in computationally constrained environments.

Practical Implications:

The reduced parameter demands and enhanced processing efficiency may open doors to broader applications in mobile robotics and mixed reality, where sensor limitations and real-time demands are critical.

Future Prospects

The innovative approach of LRSLAM to combine low-rank representations suggests further exploration into hybrid decomposition methods and adaptive scene modeling techniques in AI. Future developments may focus on optimizing decomposition strategies for dynamic scenes and integrating additional sensor modalities to support complex environmental mappings in SLAM systems.

In conclusion, LRSLAM demonstrates notable advancements in the field of dense visual SLAM, proposing a feasible solution to high memory and computational demands while maintaining accuracy. The paper effectively presents a framework that balances the need for compactness and expressiveness, offering insights into future approaches in SLAM research.

PDF Markdown

Tweets

https://twitter.com/zhenjun_zhao/status/1933464287034454292