Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks (1811.08230v1)

Published 20 Nov 2018 in cs.CV

Abstract: Event cameras have a lot of advantages over traditional cameras, such as low latency, high temporal resolution, and high dynamic range. However, since the outputs of event cameras are the sequences of asynchronous events overtime rather than actual intensity images, existing algorithms could not be directly applied. Therefore, it is demanding to generate intensity images from events for other tasks. In this paper, we unlock the potential of event camera-based conditional generative adversarial networks to create images/videos from an adjustable portion of the event data stream. The stacks of space-time coordinates of events are used as inputs and the network is trained to reproduce images based on the spatio-temporal intensity changes. The usefulness of event cameras to generate high dynamic range(HDR) images even in extreme illumination conditions and also non blurred images under rapid motion is also shown.In addition, the possibility of generating very high frame rate videos is demonstrated, theoretically up to 1 million frames per second (FPS) since the temporal resolution of event cameras are about 1{\mu}s. Proposed methods are evaluated by comparing the results with the intensity images captured on the same pixel grid-line of events using online available real datasets and synthetic datasets produced by the event camera simulator.

Citations (191)

View on Semantic Scholar

Summary

The paper introduces a method using Conditional Generative Adversarial Networks (cGANs) to generate High Dynamic Range (HDR) images and very high frame rate videos from event camera data.
It proposes two innovative event stacking methods, Stacking Based on Time (SBT) and Stacking Based on the Number of Events (SBE), to represent event data efficiently for the neural network input.
Evaluations show the framework successfully reconstructs HDR images and produces videos with superior visual quality and quantitative metrics compared to existing intensity estimation methods.

Overview of Event-Based HDR and High-FPS Video Generation Using cGANs

The paper under review introduces a method for generating high dynamic range (HDR) images and very-high-frame-rate videos from event cameras using Conditional Generative Adversarial Networks (cGANs). Event cameras offer significant advantages, such as low latency, high temporal resolution, and extensive dynamic range, which traditional cameras often lack. However, event cameras produce asynchronous event sequences rather than conventional intensity images, posing challenges for conventional image processing algorithms. This research leverages the capabilities of cGANs to reconstruct intensity images and videos from event data, thereby broadening the applications of event cameras in vision tasks like object detection and tracking.

Methodology

The paper proposes utilizing the spatio-temporal data from event cameras—the time coordinates of intensity changes—as input for a cGAN framework. Importantly, the paper introduces two innovative event stacking methods: Stacking Based on Time (SBT) and Stacking Based on the Number of Events (SBE). These methods enable the efficient representation of event data to facilitate high-quality intensity image reconstruction:

Stacking Based on Time (SBT): Events are grouped into frames over fixed time intervals, which are subsequently stacked to form the input images for the neural network. This technique captures temporal dynamics to some extent but can suffer from sparse data if few events occur within the interval.
Stacking Based on the Number of Events (SBE): This method addresses the sparsity issue by stacking events based on their count rather than a fixed time duration. It ensures a dense stack of events, offering superior results in scenarios with sporadic motion or low activity.

Results

The research demonstrates the effectiveness of the proposed methods through qualitative and quantitative evaluations. Key findings include:

HDR Image and Non-Blurred Video Generation: The proposed framework successfully reconstructs HDR images by capitalizing on the event camera's capability to capture high-contrast scenes, even in challenging illumination scenarios. Additionally, by exploiting the high temporal resolution of event cameras, the methods demonstrate the potential to generate videos at theoretically up to 1 million frames per second, significantly reducing motion blur in high-speed scenes.
Performance on Public Datasets: Evaluation against existing methodologies showcases superior output quality of the reconstructed images and videos, proving the utility of the event stacking strategies and the cGAN framework.

The training and testing utilized both real-world datasets captured with DAVIS cameras and synthetic datasets generated via the ESIM simulator. The paper reports improved visual quality and quantitative metrics (measured by SSIM, FSIM, and PSNR) over traditional APS frames and other intensity estimation methods.

Implications and Future Directions

This work expands the utility of event cameras by providing a robust framework to convert event data into practical intensity images and videos, addressing both the HDR and high-frame-rate domains. This has significant implications for applications requiring rapid response times or broad dynamic range, such as autonomous driving, robotics, and high-speed motion analysis.

The proposed methods and cGAN network designs open avenues for further research in advanced neural architectures optimized for event data. Additionally, broader deployment and testing of these techniques in varied real-world scenarios could validate and extend their applicability, potentially revolutionizing event-based vision systems.

Overall, the paper provides a robust foundation for future exploration in event-based image generation and demonstrates the transformative potential of combining advanced deep learning techniques with bio-inspired vision sensors.