Visual Odometry Revisited: What Should Be Learnt? (1909.09803v4)

Published 21 Sep 2019 in cs.CV

Abstract: In this work we present a monocular visual odometry (VO) algorithm which leverages geometry-based methods and deep learning. Most existing VO/SLAM systems with superior performance are based on geometry and have to be carefully designed for different application scenarios. Moreover, most monocular systems suffer from scale-drift issue.Some recent deep learning works learn VO in an end-to-end manner but the performance of these deep systems is still not comparable to geometry-based methods. In this work, we revisit the basics of VO and explore the right way for integrating deep learning with epipolar geometry and Perspective-n-Point (PnP) method. Specifically, we train two convolutional neural networks (CNNs) for estimating single-view depths and two-view optical flows as intermediate outputs. With the deep predictions, we design a simple but robust frame-to-frame VO algorithm (DF-VO) which outperforms pure deep learning-based and geometry-based methods. More importantly, our system does not suffer from the scale-drift issue being aided by a scale consistent single-view depth CNN. Extensive experiments on KITTI dataset shows the robustness of our system and a detailed ablation study shows the effect of different factors in our system.

Citations (144)

View on Semantic Scholar

Summary

The paper introduces DF-VO, which integrates CNN-based depth and flow predictions with traditional geometric methods to overcome scale drift.
It employs robust RANSAC filtering, epipolar geometry, and PnP estimation, achieving superior performance on the KITTI dataset.
The study paves the way for hybrid VO systems that combine deep learning and geometry for improved accuracy in autonomous navigation and robotics.

Visual Odometry Revisited: Integration of Geometry and Deep Learning

The paper "Visual Odometry Revisited: What Should Be Learnt?" by Huangying Zhan et al. revisits the concept of monocular visual odometry (VO) with an innovative approach that synthesizes traditional geometry-based methods and the advancements of deep learning. This paper is rooted in addressing the limitations of existing VO/SLAM systems, particularly scale drift issues and the over-reliance on environment-specific tuning in geometry-based approaches.

Methodology and Approach

The authors propose a novel VO algorithm, named DF-VO, which leverages two types of deep learning models: convolutional neural networks (CNNs) for predicting single-view depth and estimating two-view optical flow. These deep predictions serve as intermediate outputs to establish correspondences utilized in traditional geometric methods such as epipolar geometry and the PnP method. This innovative integration provides a robust frame-to-frame VO algorithm that surpasses the performance of pure deep learning-based and traditional geometry-based methods.

Key highlights of the methodology include:

Depth and Flow CNNs: Utilizing two CNN models to estimate depth from a single image and optical flow between frames, providing scale consistent estimates vital for resolving scale drift issues.
Epipolar Geometry and PnP Integration: The system utilizes 2D-2D pixel correspondences for solving the essential matrix and triangulates 3D points for PnP pose estimation, enhanced by deep predictions to avoid the scale ambiguity and pure rotation issues.
Robust Pipeline with RANSAC: To improve robustness, the algorithm incorporates RANSAC loops for filtering outliers and handling noisy predictions, ensuring accurate and stable pose estimation.

Experimental Results

Employing the KITTI dataset, the authors demonstrate the robustness and efficacy of the proposed DF-VO system. Across various sequences, DF-VO achieves superior performance metrics compared to both hierarchy methods. The comprehensive ablation studies revealed that incorporating scale-consistent depth predictions significantly mitigates traditional monocular VO limitations, such as scale drift, and maintains a high degree of accuracy in camera pose estimation.

Implications and Future Work

The integration of deep learning with geometric methods in DF-VO presents significant advancements in the robustness and accuracy of VO systems. The paper illuminates the potential of deep predictions to not only complement but enhance traditional techniques, paving the way towards more reliable and adaptable VO systems. The ability to handle challenging conditions, where purely geometric methods fail, underscores the practical applicability in autonomous navigation and mobile robotics.

Future prospects of this research entail extending the proposed system to a map-to-frame tracking approach, potentially integrating loop closure techniques to further enhance the operational stability and accuracy of SLAM systems. The paper suggests a promising avenue for future exploration in hybrid methods, leveraging the strengths of both learning and algorithmic paradigms without the need for detailed environment-specific tuning.

This research substantiates the necessity of cross-disciplinary methodologies in advancing visual odometry and sets a precedent for further explorations in the field.

PDF Markdown

Related Papers

YouTube

Show All Videos