Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios (1709.06310v4)

Published 19 Sep 2017 in cs.CV and cs.RO

Abstract: Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high speed motions or in scenes characterized by high dynamic range. However, event cameras output only little information when the amount of motion is limited, such as in the case of almost still motion. Conversely, standard cameras provide instant and rich information about the environment most of the time (in low-speed and good lighting scenarios), but they fail severely in case of fast motions, or difficult lighting such as high dynamic range or low light scenes. In this paper, we present the first state estimation pipeline that leverages the complementary advantages of these two sensors by fusing in a tightly-coupled manner events, standard frames, and inertial measurements. We show on the publicly available Event Camera Dataset that our hybrid pipeline leads to an accuracy improvement of 130% over event-only pipelines, and 85% over standard-frames-only visual-inertial systems, while still being computationally tractable. Furthermore, we use our pipeline to demonstrate - to the best of our knowledge - the first autonomous quadrotor flight using an event camera for state estimation, unlocking flight scenarios that were not reachable with traditional visual-inertial odometry, such as low-light environments and high-dynamic range scenes.

Authors (4)

Antoni Rosinol Vidal (1 paper)
Henri Rebecq (10 papers)
Timo Horstschaefer (1 paper)
Davide Scaramuzza (190 papers)

Citations (410)

View on Semantic Scholar

Summary

Overview of "Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios"

In this seminal paper, the authors introduce an innovative state estimation pipeline that amalgamates the capabilities of event cameras, standard cameras, and inertial measurement units (IMU) to enhance the performance of visual SLAM in high-speed and high dynamic range (HDR) settings. The integration holistically exploits the distinct advantages of each sensor. Event cameras are lauded for their high temporal resolution and insusceptibility to motion blur, making them advantageous for fast motion scenarios. Standard cameras, on the other hand, provide comprehensive intensity information which is invaluable in lower-speed and well-lit conditions.

Key Contributions

Hybrid Sensor Fusion: The paper presents a tightly-coupled sensor fusion technique that utilizes events, images, and IMU data. This approach significantly outperforms classical methods leveraging solely event cameras or standard visual-inertial systems. The hybrid system achieves a 130% accuracy improvement over event-only systems, and 85% over standard-frames-only systems.
Novel Applications: An application of the proposed SLAM system in the context of autonomous quadrotor flight demonstrates the practical potential of this fusion method. This application enables flight in environments previously challenging for conventional approaches, such as low-light and HDR scenarios.
Computational Tractability: Importantly, despite the added complexity of fusing multiple sensor modalities, the system remains computationally feasible, suggesting readiness for deployment on platforms with limited processing power.

Methodology

The authors employ a multi-sensor fusion strategy where event cameras and standard cameras contribute complementary information for state estimation. The motion of the camera is estimated by combining spatial-temporal windows of events, motion compensation, feature tracking via FAST corner detection, and the KLT tracker. The pose and structure are jointly optimized through keyframe-based nonlinear optimization, merging visual and inertial cues. The manipulation of event data to synthesize motion-compensated event frames is a pivotal component, attenuating the deficiencies of individual sensors and enhancing overall system robustness.

Implications and Future Directions

The implications of this research are profound for advancing SLAM in robotics and perception systems, particularly in domains demanding high-speed image processing and robustness against varying lighting conditions. By transcending the limitations of individual sensors, the proposed framework facilitates enhanced environmental perception, a critical requirement for applications ranging from automotive systems to aerial robotics.

Future research could extend this hybrid SLAM framework by incorporating modalities such as depth cameras or exploring machine learning models that further refine sensor fusion effectiveness. Furthermore, given the growing ubiquity of event cameras, it could be beneficial to investigate the integration of larger-scale networks that leverage distributed computing to broaden these principles to wider-area SLAM challenges.

In summary, the paper presents a thoroughly detailed approach to enhancing visual state estimation through a synergistic blending of traditional and innovative sensing techniques. The demonstrated results reveal promising advancements in both theoretical and practical realms of SLAM, underscoring the essential role of hybrid sensor fusion in future robotics and machine perception applications.

PDF Markdown

Related Papers

YouTube

Show All Videos