Unsupervised Learning of Dense Optical Flow, Depth and Egomotion from Sparse Event Data (1809.08625v2)

Published 23 Sep 2018 in cs.CV, cs.LG, and cs.RO

Abstract: In this work we present a lightweight, unsupervised learning pipeline for \textit{dense} depth, optical flow and egomotion estimation from sparse event output of the Dynamic Vision Sensor (DVS). To tackle this low level vision task, we use a novel encoder-decoder neural network architecture - ECN. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only. The network works in self-supervised mode and has just 150k parameters. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for realtime robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night.

Citations (37)

View on Semantic Scholar

Summary

The paper introduces the first monocular unsupervised framework to generate dense depth and optical flow from sparse event data.
It employs a compact Evenly-Cascaded Network with just 150k parameters, achieving up to 250 FPS on standard GPUs.
The framework relies solely on event-based input with novel normalization techniques to enhance robustness in low-light and challenging conditions.

Unsupervised Learning of Dense Optical Flow, Depth, and Egomotion from Sparse Event Data

The paper presents a significant advancement in the domain of computer vision through the development of a lightweight, unsupervised learning pipeline specifically designed for the estimation of dense depth, optical flow, and egomotion using sparse event output from a Dynamic Vision Sensor (DVS). The authors introduce a novel encoder-decoder network architecture, referred to as the Evenly-Cascaded Network (ECN), which displays excellent performance in terms of speed and accuracy for real-time robotics applications.

Key Contributions and Methodology

Monocular Pipeline Innovation: The primary contribution is the development of the first monocular unsupervised framework capable of generating both dense depth and optical flow using only sparse event data. This pipeline employs a self-supervised learning approach, eliminating the need for conventional image frames.
Network Architecture: The ECN architecture is markedly compact, having just 150k parameters, yet it effectively addresses the challenges posed by the sparsity and noise of event data. Its design facilitates the inference process, reaching up to 250 FPS on standard GPU hardware, thereby making it suitable for real-time applications in autonomous systems.
Event Data Processing: Unlike traditional methods that rely on grayscale intensity for supervision, the proposed framework exclusively leverages event-based input. This approach inherently improves robustness in adverse conditions such as low-light environments. A unique feature of this method is the averaging of event timestamps, which aids in noise reduction without sacrificing temporal information.
Novel Normalization Techniques: The authors introduce a feature decorrelation technique, which enhances training efficiency and prediction accuracy. This technique forms a part of their systematic evaluation of normalization strategies, underscoring its utility in optimizing the ECN architecture for event data.
Quantitative Evaluation: The effectiveness of the proposed solution is demonstrated through extensive testing on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset. The results indicate its proficiency in estimating motion and reconstructing scenes even under challenging lighting conditions, surpassing previous deep learning methodologies applied to event data.

Results and Discussion

The ECN-based pipeline exhibits significant improvements over existing methodologies in handling sparse event data. The reported quantitative results reveal consistent reductions in Average Endpoint Errors (AEE) and improvements in translational and rotational motion estimates across various testing scenarios. This achievement is complemented by the model's robust generalization capabilities, showcased in both day and night conditions without retraining.

Theoretical implications of this work suggest substantial progress in event-based vision systems, challenging traditional image processing paradigms. On a practical level, the method's low computational requirements and high inference speed present a compelling case for its integration into autonomous systems and robotics, where real-time performance is crucial.

Future Directions

The authors acknowledge several areas for future exploration. The extension of their work into the domain of moving object detection and tracking represents an obvious continuation, given the pipeline's current design focuses on structure from motion (SfM) recovery. Further, integrating more sophisticated representations of event clouds and exploiting space-time frequency information holds potential for enhancing the capability and resolution of the system.

In conclusion, this paper addresses a critical gap in event-based vision research with a novel, efficient pipeline capable of processing sparse data for reliable motion and depth estimation. This work lays a foundation for future research and practical applications in the field of autonomous systems and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos