CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction (1810.01011v1)

Published 1 Oct 2018 in cs.CV

Abstract: Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of probabilistic mapping method. This paper improves the SVO mapping by initializing the mean and the variance of the depth at a feature location according to the depth prediction from a single-image depth prediction network. By significantly reducing the depth uncertainty of the initialized map point (i.e., small variance centred about the depth prediction), the benefits are twofold: reliable feature correspondence between views and fast convergence to the true depth in order to create new map points. We evaluate our method with two outdoor datasets: KITTI dataset and Oxford Robotcar dataset. The experimental results indicate that the improved SVO mapping results in increased robustness and camera tracking accuracy.

Citations (83)

View on Semantic Scholar

Summary

The paper presents CNN-SVO which integrates single-image depth prediction to reduce depth uncertainty in semi-direct visual odometry.
The method enhances feature correspondence and speeds up map point convergence, validated on KITTI and Oxford datasets.
The approach improves tracking robustness in challenging lighting conditions, indicating significant potential for autonomous and robotic applications.

CNN-SVO: Enhancing Semi-Direct Visual Odometry with Depth Prediction

The paper "CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction" introduces a significant advancement in the field of visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM), particularly focusing on improving the mapping capabilities of semi-direct visual odometry (SVO). The researchers propose an innovative methodology that integrates single-image depth prediction to enhance the initialization process of map points in SVO, achieving greater robustness and accuracy in visual odometry tasks.

Technical Insights and Methodology

The authors develop CNN-SVO, a modified version of the existing SVO framework, which customarily combines the strengths of both direct and indirect feature matching methods. SVO is known for its efficient probabilistic mapping method and direct pixel correspondence, which leads to high frame rate camera motion estimation. However, its mapping technique sometimes suffers due to large depth uncertainty during the initialization of map points, resulting in potential erroneous feature correspondences.

CNN-SVO addresses this limitation by utilizing a single-image depth prediction network to provide depth priors, allowing for improved initialization of map points. With a reduced uncertainty in depth prediction—characterized by a small variance centered around the predicted depth—the system achieves more reliable feature correspondence across views and speeds up convergence to true depth values, ultimately resulting in more accurate and robust camera motion tracking.

Evaluation and Results

The authors validate CNN-SVO on two outdoor datasets: the KITTI dataset and the Oxford Robotcar dataset. The performance is benchmarked against state-of-the-art VO methods including direct sparse odometry (DSO), the original SVO, and ORB-SLAM without loop closure. The proposed method shows remarkable improvements, notably in its ability to maintain tracking in high dynamic range (HDR) environments due to the illumination invariance of the depth prediction. The rapid convergence of map points, facilitated by the use of depth prediction, enhances tracking robustness and accuracy of the camera trajectory across both datasets.

Implications and Future Directions

Practically, CNN-SVO offers substantial improvements in scenarios with challenging lighting conditions, making it a potent tool for applications requiring efficient and reliable camera motion estimation such as autonomous driving and robotics. Theoretically, the successful integration of depth prediction into a semi-direct VO framework opens avenues for further research into hybrid approaches that leverage learned priors to improve VO performance across varying conditions and sensor modalities.

Looking forward, potential developments could include improving the tolerance to various environmental and operational conditions such as motion blur and varying camera framerates. Further refinement of depth prediction and its coupling with other scene understanding frameworks could enhance the system's performance even in completely occluded or textureless environments. As AI and machine learning models continue to evolve, integrating these advancements into VO systems is expected to yield even more robust and versatile solutions for real-world applications.

In conclusion, CNN-SVO emerges as a notable contribution to the field of visual odometry by enhancing the reliability and accuracy of mapping processes through the innovative use of depth prediction, thus setting a precedent for future research and development in the domain.

Related Papers

YouTube

Show All Videos