- The paper presents AirSLAM, a novel system integrating CNN-based point-line feature extraction with robust geometric optimization.
- It achieves real-time performance, running up to 73Hz on PC and 40Hz on embedded platforms, through efficient hybrid processing and hardware acceleration.
- The system demonstrates superior mapping accuracy and resilience to dynamic, low-light conditions across diverse benchmark datasets.
AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System
In this paper, the authors introduce AirSLAM, a visual simultaneous localization and mapping (vSLAM) system engineered to address challenges associated with dynamic and low-light environments while optimizing for efficiency. Leveraging both novel algorithms and advanced machine learning techniques, AirSLAM demonstrates superior performance by combining deep learning for feature detection and matching with traditional backend optimization methods. This hybrid approach amalgamates the robustness of neural networks with the precision of geometric models.
System Design and Contributions
AirSLAM's architecture is founded on several core innovations:
- Unified CNN (PLNet): This system integrates a convolutional neural network (CNN) to concurrently extract keypoints and structural line features, termed PLNet. By utilizing a shared backbone for both keypoint and line detection, PLNet minimizes computational overhead while enhancing feature detection robustness across varied illumination scenarios.
- Hybrid System Approach: AirSLAM balances efficiency and performance by using learning-based methods for the front-end feature detection and matching, and traditional geometric approaches for backend optimization. This includes triangulation, reprojection, and local bundle adjustment, ensuring real-time performance on embedded platforms.
- Multi-Stage Relocalization: A novel multi-stage relocalization pipeline is proposed, leveraging both appearance and geometric information from the built map. This approach enhances the speed and robustness of relocalization by efficiently filtering potential matches and reducing computational requirements during pose estimation.
- Efficient Implementation: The system capitalizes on hardware acceleration using C++ and NVIDIA TensorRT to speed up the execution of the neural network and feature matching processes. Consequently, AirSLAM achieves notable real-time performance, running at 73Hz on a PC and 40Hz on an embedded platform.
Experimental Evaluation
The AirSLAM system was rigorously tested on various datasets to evaluate its performance relative to state-of-the-art (SOTA) vSLAM systems. The extensive experimentation covered several scenarios:
- Line Detection Performance:
- PLNet's Evaluation: On the Wireframe and YorkUrban datasets, PLNet achieved second-best results on Wireframe and the best on YorkUrban, indicating robust and efficient detection capabilities. Notably, PLNet, which detects both feature points and lines, operates at 79.4 FPS, thereby validating its suitability for real-time applications.
- Mapping Accuracy:
- EuRoC Dataset: AirSLAM, tested against top vSLAM systems like ORB-SLAM3 and Kimera, demonstrated superior accuracy on 7 out of 11 sequences. The introduction of loop closure detection significantly reduced the average error by 74%.
- Robustness to Challenging Illumination:
- Onboard Illumination (OIVIO Dataset): AirSLAM outperformed other SOTA systems, showcasing minimal impact from uneven illumination in mining scenarios.
- Dynamic Illumination (UMA-VI Dataset): Exhibiting superior robustness under severe lighting changes, AirSLAM achieved the lowest average error on dynamic lighting sequences.
- Low Illumination (Dark EuRoC): By adjusting brightness levels in the EuRoC dataset, AirSLAM maintained stability and accuracy, outshining other systems even under low illumination conditions.
- Day/Night Localization:
- TartanAir Dataset: AirSLAM was benchmarked against leading visual place recognition (VPR) methods. AirSLAM exhibited a higher and more stable recall rate (average 80.5%) across sequences compared to the next best (76.3%), while maintaining over twice the efficiency (48.8 FPS).
Implications and Future Developments
Practical Implications: AirSLAM's robustness to illumination changes extends its applicability in real-world scenarios such as autonomous mining, warehouse operations, and navigation in urban environments. The integrated hardware acceleration ensures deployment feasibility on resource-constrained devices.
Theoretical Implications: The presented multi-stage relocalization strategy and the unified CNN model for simultaneous keypoint and line detection offer a novel methodological pathway. This research highlights the potential of hybrid systems, pointing to new directions for combining deep learning with traditional SLAM to address environments where lighting can unpredictably change.
Future Developments: Addressing limitations, future work might include enhancing AirSLAM to perform robustly in unstructured environments where line features are sparse. Investigating adaptive models for real-time performance optimizations relative to different hardware configurations could further extend AirSLAM's flexibility and applicability.
Conclusion
AirSLAM exemplifies a significant advancement in visual SLAM, emphasizing efficiency and robustness in challenging lighting conditions. Through its innovative use of hybrid architectures and hardware acceleration, it stands as a powerful tool for enhancing robotic perception and navigation in complex environments.