Selective Sensor Fusion for Neural Visual-Inertial Odometry (1903.01534v1)

Published 4 Mar 2019 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: Deep learning approaches for Visual-Inertial Odometry (VIO) have proven successful, but they rarely focus on incorporating robust fusion strategies for dealing with imperfect input sensory data. We propose a novel end-to-end selective sensor fusion framework for monocular VIO, which fuses monocular images and inertial measurements in order to estimate the trajectory whilst improving robustness to real-life issues, such as missing and corrupted data or bad sensor synchronization. In particular, we propose two fusion modalities based on different masking strategies: deterministic soft fusion and stochastic hard fusion, and we compare with previously proposed direct fusion baselines. During testing, the network is able to selectively process the features of the available sensor modalities and produce a trajectory at scale. We present a thorough investigation on the performances on three public autonomous driving, Micro Aerial Vehicle (MAV) and hand-held VIO datasets. The results demonstrate the effectiveness of the fusion strategies, which offer better performances compared to direct fusion, particularly in presence of corrupted data. In addition, we study the interpretability of the fusion networks by visualising the masking layers in different scenarios and with varying data corruption, revealing interesting correlations between the fusion networks and imperfect sensory input data.

Authors (7)

Changhao Chen (64 papers)
Stefano Rosa (17 papers)
Yishu Miao (19 papers)
Chris Xiaoxuan Lu (50 papers)
Wei Wu (482 papers)
Andrew Markham (94 papers)
Niki Trigoni (86 papers)

Citations (128)

View on Semantic Scholar

Summary

Selective Sensor Fusion for Neural Visual-Inertial Odometry: An Expert Review

The paper "Selective Sensor Fusion for Neural Visual-Inertial Odometry" by Changhao Chen et al. presents an innovative framework for enhancing Visual-Inertial Odometry (VIO) systems through selective sensor fusion strategies, specifically targeting the integration of monocular visual data and inertial measurements. The research addresses the challenge of unreliable sensor data in real-world scenarios, improving upon traditional methodologies which often rely on direct sensor data fusion without accounting for potential data corruption or loss.

Key Contributions and Methodologies

The paper introduces two novel fusion modalities—deterministic soft fusion and stochastic hard fusion—opposed to conventional direct fusion routines. The following encapsulates the pivotal strategies utilized in these approaches:

Deterministic Soft Fusion: This method re-weights features extracted from visual and inertial data using a sigmoid function, effectively applying continuous value masks to mitigate the impact of less reliable data during the fusion process. Soft fusion maintains a deterministic regime, suitable for scenarios where sensor error margins are narrower.
Stochastic Hard Fusion: A more robust approach involving Gumbel-Softmax resampling, this method assigns binary masks to features, either preserving or disregarding them based on their inferred reliability. The stochastic nature provides enhanced resilience against severe data degradation, offering an intuitive representation of sensor reliability.

Adopting a deep learning architecture, which integrates these fusion techniques, the paper leverages a recurrent neural network for temporal modeling—incorporating feature encoders for visual and inertial data—culminating in a pose regression network trained end-to-end.

Numerical Results and Analysis

The empirical evaluation on various datasets—KITTI, EuRoC MAV, and PennCOSYVIO—demonstrates the advantage of these advanced fusion techniques over traditional VIO models and direct fusion baselines. Notably, under conditions simulating real-world sensor degradation scenarios such as occluded, blurred, and temporally misaligned data, the proposed methods sustained lower translational and rotational error metrics:

KITTI Dataset: Compared to direct fusion, both soft and hard fusion approaches consistently reduced error rates in scenarios with up to 10% image blockage and noise.
EuRoC MAV: The techniques proved effective in aerial vehicle datasets, showcasing improvements in trajectory estimation when encountering vision and sensor misalignments.
PennCOSYVIO: Particularly effective in handheld device conditions, the selective fusion strategies illustrated robustness to frequent transition-induced corruption, preserving higher integrity of localization outputs.

These results underline the adaptive capacity of selective fusion strategies to maintain functionality despite substantial external perturbations.

Theoretical Implications and Future Directions

The implications of this work are manifold within both the theoretical and practical landscapes of autonomous navigation and robotics. The framework highlights the potential for learned, adaptive sensor fusion mechanisms to outperform static, handcrafted pipelines, especially in dynamically unpredictable environments.

The parametric modeling of feature weights provides an interpretable lens through which to understand sensor contribution and reliability, paving the way for dynamically adjustable VIO systems. Future research could explore extensions into multi-sensor VIO systems, encompassing other modalities like LIDAR, further refining fusion strategies through deeper integration with reinforcement learning for autonomous adaptability.

In summation, the proposed selective sensor fusion framework marks a substantial advancement in the domain of deep learning-based VIO, offering a compelling alternative to existing methodologies. Its implementation could significantly enhance the robustness and accuracy of ego-motion estimation across diverse application settings, fostering greater autonomy in navigation technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos