Deep Auxiliary Learning for Visual Localization and Odometry (1803.03642v1)

Published 9 Mar 2018 in cs.RO and cs.LG

Abstract: Localization is an indispensable component of a robot's autonomy stack that enables it to determine where it is in the environment, essentially making it a precursor for any action execution or planning. Although convolutional neural networks have shown promising results for visual localization, they are still grossly outperformed by state-of-the-art local feature-based techniques. In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images. Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable. We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates. We evaluate our proposed VLocNet on indoor as well as outdoor datasets and show that even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation. Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-the-art SIFT-based approaches.

Citations (239)

View on Semantic Scholar

Summary

The paper introduces VLocNet, a multitask deep CNN architecture for simultaneous 6-DoF pose regression and odometry estimation in both indoor and outdoor settings.
It employs hard parameter sharing and a novel Geometric Consistency Loss to ensure real-time performance and robust pose consistency.
Empirical evaluations show significant improvements over state-of-the-art methods, with up to 77.14% gains in translation accuracy in indoor environments.

Analyzing "Deep Auxiliary Learning for Visual Localization and Odometry": A Robust Approach to Multitask Learning in Robotics

The paper "Deep Auxiliary Learning for Visual Localization and Odometry," presented at the IEEE International Conference on Robotics and Automation (ICRA) in 2018 by Valada, Radwan, and Burgard, introduces a novel deep learning architecture, VLocNet, aimed at enhancing visual localization and odometry estimation concurrently. As autonomous systems and robotics continue to advance, localization remains a critical task that influences navigation, mapping, and interaction with environments. This research addresses the performance gap in localization accuracy between convolutional neural network (CNN) methods and feature-based techniques by proposing a multitask learning framework.

VLocNet: Architectural Design and Objectives

VLocNet is a deep convolutional neural network designed to perform 6-DoF (Degrees of Freedom) pose regression and odometry estimation using monocular image sequences. By integrating both global pose regression and visual odometry into a single architecture, VLocNet employs multitask learning to simultaneously leverage task-specific and shared features. Hard parameter sharing is utilized to maintain a compact and efficient model, enabling real-time processing capabilities. This is augmented by a newly introduced Geometric Consistency Loss function, which incorporates relative motion constraints to enhance pose consistency and robustness.

Superior Performance and Evaluation

The network's performance is extensively evaluated against indoor and outdoor datasets, specifically the Microsoft 7-Scenes and Cambridge Landmarks datasets, showing that VLocNet surpasses existing CNN-based methods in both localization and odometry tasks. Quantitatively, VLocNet improves mean localization accuracy by 77.14% in translation and 59.14% in rotation compared to state-of-the-art approaches within indoor scenes. Similarly, outdoor benchmarks on the Cambridge Landmarks dataset reveal a 51.6% improvement in translation accuracy. The results demonstrate VLocNet's robustness across various challenging environmental conditions, such as illumination changes and dynamic elements.

Comparison with Traditional Methods

A benchmarking exercise reveals that VLocNet's accuracy in certain contexts rivals that of traditional feature-based approaches like SIFT, namely the Active Search method. This is significant as it indicates that deep learning models can potentially match or even outperform classical localization techniques that rely on feature correspondence, a major consideration for real-world applications where robustness to environmental conditions is paramount.

Implications and Future Directions

The findings have substantial implications for the development of autonomous systems. By achieving real-time localization and addressing perceptual aliasing through auxiliary learning, VLocNet offers a practical solution for real-world deployment in robotics. Furthermore, the paper suggests possible future avenues for research, such as integrating additional auxiliary tasks like semantic segmentation, which could further bolster performance and extend the model's applicability across domains.

In conclusion, "Deep Auxiliary Learning for Visual Localization and Odometry" not only demonstrates a practical implementation of multitask learning in CNNs but also highlights the advancing capabilities of deep learning systems in addressing complex robotics challenges. As such, this work stands as a notable contribution in the pursuit of autonomous and reliable robotic systems.

PDF Markdown