Papers
Topics
Authors
Recent
Search
2000 character limit reached

Video to Events: Recycling Video Datasets for Event Cameras

Published 6 Dec 2019 in cs.CV | (1912.03095v2)

Abstract: Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high dynamic range (HDR), high temporal resolution, and no motion blur. Recently, novel learning approaches operating on event data have achieved impressive results. Yet, these methods require a large amount of event data for training, which is hardly available due the novelty of event sensors in computer vision research. In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. This unlocks the use of a virtually unlimited number of existing video datasets for training networks designed for real event data. We evaluate our method on two relevant vision tasks, i.e., object recognition and semantic segmentation, and show that models trained on synthetic events have several benefits: (i) they generalize well to real event data, even in scenarios where standard-camera images are blurry or overexposed, by inheriting the outstanding properties of event cameras; (ii) they can be used for fine-tuning on real data to improve over state-of-the-art for both classification and semantic segmentation.

Citations (188)

Summary

  • The paper introduces a neural network-based method that recycles conventional video datasets to generate synthetic event data for event camera training.
  • It employs frame interpolation and generative models to mimic real event data, resulting in a 4.3% boost in object recognition and a 0.8% improvement in semantic segmentation.
  • The approach offers a cost-effective solution to overcome data scarcity, enhancing domain adaptation and performance in event-based vision tasks.

An Expert Review of "Video to Events: Recycling Video Datasets for Event Cameras"

The paper entitled "Video to Events: Recycling Video Datasets for Event Cameras," presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020, addresses an important issue in the field of event cameras: the paucity of large-scale event data needed for training modern machine learning models. Event cameras, characterized by their high dynamic range (HDR), high temporal resolution, and absence of motion blur, provide several advantages over conventional video cameras. However, the novelty of these sensors poses a challenge in the form of limited availability of training data.

Overview of Contributions

This research proposes a methodology to convert existing video datasets into synthetic event datasets, effectively enabling the use of extensive collections of conventional video data for training networks aimed at processing real event data. The method utilizes neural network-based frame interpolation in conjunction with established generative models for event data, enabling the transformation of conventional video into events that closely mimic those generated by event cameras.

Evaluation and Results

The paper evaluates the proposed method across two principal computer vision tasks: object recognition and semantic segmentation. The findings illustrate that models trained on synthetic events generated from traditional video datasets exhibit strong generalization capabilities to real event data. Notably, models trained using synthetic events alone can be enhanced through fine-tuning with real-event datasets to outperform state-of-the-art performance in these vision tasks.

Specifically, the research shows that models trained on synthetic events can improve over existing methods by 4.3% for object recognition and by 0.8% for semantic segmentation when fine-tuned on real event data. This indicates promising potential for harnessing large-scale conventional video data to support demanding event-based vision tasks.

Practical and Theoretical Implications

The implications of this work are substantial, both practically and theoretically. By leveraging large-scale video datasets, the method significantly expands the resources available for training event-based models, which could pave the way for new applications of event cameras in domains that challenge conventional sensors, such as autonomous driving or surveillance under varying lighting conditions.

Theoretically, the research suggests a novel approach to domain adaptation between synthetic and real event data, offering an intriguing direction for future studies to explore adaptive learning and generalization capabilities of models in vision systems.

Potential Future Developments

Future developments may involve refining the paper’s proposed methodology by integrating noise modeling of event data, which could further improve the realism and effectiveness of synthetic event generation. Additionally, as advances continue in frame interpolation technology, the approach can potentially benefit from improved interpolation algorithms that enhance the synthetic event data quality.

Overall, this paper makes a significant contribution toward addressing the limitations faced by researchers working with event cameras by proposing a method to utilize vast existing video datasets, thus fostering innovation and new explorations in event-based vision.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.