CNN-based Ego-Motion Estimation for Fast MAV Maneuvers (2101.01841v2)

Published 6 Jan 2021 in cs.CV and cs.RO

Abstract: In the field of visual ego-motion estimation for Micro Air Vehicles (MAVs), fast maneuvers stay challenging mainly because of the big visual disparity and motion blur. In the pursuit of higher robustness, we study convolutional neural networks (CNNs) that predict the relative pose between subsequent images from a fast-moving monocular camera facing a planar scene. Aided by the Inertial Measurement Unit (IMU), we mainly focus on translational motion. The networks we study have similar small model sizes (around 1.35MB) and high inference speeds (around 10 milliseconds on a mobile GPU). Images for training and testing have realistic motion blur. Departing from a network framework that iteratively warps the first image to match the second with cascaded network blocks, we study different network architectures and training strategies. Simulated datasets and a self-collected MAV flight dataset are used for evaluation. The proposed setup shows better accuracy over existing networks and traditional feature-point-based methods during fast maneuvers. Moreover, self-supervised learning outperforms supervised learning. Videos and open-sourced code are available at https://github.com/tudelft/PoseNet_Planar

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that CNN-based methods outperform traditional feature-point tracking for fast MAV maneuvers.
The study uses cascaded CNN architectures with simulated and real flight data to estimate translational motion.
Self-supervised learning and pyramidal warping techniques significantly improve robustness against motion blur.

CNN-based Ego-Motion Estimation for Fast MAV Maneuvers

The paper, titled "CNN-based Ego-Motion Estimation for Fast MAV Maneuvers," authored by Yingfu Xu and Guido C. H. E. de Croon, explores the application of convolutional neural networks (CNNs) for visual ego-motion estimation in fast-moving Micro Air Vehicles (MAVs) using monocular cameras. This study addresses the critical challenge of achieving accurate motion estimation during rapid maneuvers, often hampered by significant visual disparities and motion blur.

Background and Motivation

The pursuit of autonomous indoor flight for MAVs hinges on robust state estimation systems, notably those capable of efficient ego-motion measurement. Traditional visual-inertial odometry (VIO) systems rely heavily on the detection and tracking of interest-point-based features, but these systems struggle with robustness during fast maneuvers due to issues like motion blur and the reduced number of frames available for feature tracking. Furthermore, large visual disparities complicate feature correspondence, leading to increased state estimation errors and drift.

Approach and Methodology

The authors investigate CNN architectures designed to predict the relative pose between sequential images captured by a fast-moving downward-facing camera. The primary focus is on estimating translational motion utilizing CNNs with small model sizes and high inference speeds, tailored for mobile GPU applications. The research builds upon the notion of warping initial images to match subsequent ones using predicted relative poses, enhancing accuracy through cascaded network blocks.

The study leverages both simulated datasets and a self-collected MAV flight dataset, incorporating significant motion blur and large visual disparities inherent in fast maneuvers. The networks are evaluated using both supervised and self-supervised learning approaches, revealing that the latter method consistently yields superior accuracy.

Results and Key Findings

The analysis demonstrates that CNNs can outperform traditional feature-point-based methods during fast MAV maneuvers. Significant findings include:

Networks trained with self-supervised learning based on photometric errors show enhanced performance compared to supervised learning using pre-defined ground truth.
The implementation of pyramidal images and feature maps improves the networks' ability to handle larger visual disparities.
Networks trained solely on simulated data generalize well to real-world flight data, highlighting improved robustness against motion blur without any fine-tuning.

Implications and Future Directions

The implications of this research extend into practical applications for MAVs, where robust ego-motion estimation is critical for achieving higher autonomy and efficiency, especially in environments with rapid and unpredictable motion. The study suggests that adopting CNN-based approaches can substantially reduce the failure rates experienced with traditional methods.

Future developments could focus on integrating these CNN-based ego-motion estimators into more comprehensive VIO systems that utilize additional data sources, such as higher-fidelity IMU data, for further enhancement of real-time state estimation. Another promising direction involves exploring ways to predict uncertainty associated with CNN predictions, potentially leading to dynamic adaptation to varying egocentric conditions.

Overall, the research contributes valuable insights into the integration of advanced neural network techniques within the domain of MAV navigation and control systems, shaping the pathway for more reliable and efficient autonomous flight capabilities.