DART: Distribution Aware Retinal Transform for Event-based Cameras (1710.10800v3)

Published 30 Oct 2017 in cs.CV

Abstract: We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.

Citations (112)

View on Semantic Scholar

Summary

The paper presents the DART descriptor, which leverages biologically-inspired log-polar sampling to robustly encode event-based camera data.
By integrating DART into a bag-of-words framework, the authors achieve high classification accuracy—97.95% on the N-MNIST dataset—and effective tracking performance.
The study demonstrates DART's potential for real-time vision in autonomous robotics, offering low-latency detection and reliable feature matching under dynamic conditions.

Analysis of "DART: Distribution Aware Retinal Transform for Event-based Cameras"

The paper "DART: Distribution Aware Retinal Transform for Event-based Cameras" introduces a novel visual descriptor tailored for event-based vision systems, which represent a promising alternative to traditional frame-based cameras. The authors propose the Distribution Aware Retinal Transform (DART), a descriptor designed to handle tasks such as object classification, tracking, detection, and feature matching. The work leverages the inherent advantages of event cameras, including high temporal resolution and low latency.

Key Contributions

The DART descriptor utilizes log-polar grids to encode the structural context around events generated by cameras, mirroring the distribution of cones in the primate fovea. This log-polar sampling is naturally suited for addressing scaling and rotation variations, thus making the DART descriptor robust in dynamic environments. Notably, the paper demonstrates that these descriptors lead to competitive results across several benchmarks in the field of event-based vision.

Significantly, the DART descriptor was employed in various applications, yielding impressive outcomes, specifically:

Object Classification: By integrating DART into a bag-of-words model, the authors achieved robust classification across several datasets. For example, they reported a classification accuracy of 97.95% on the N-MNIST dataset.
Tracking: The research extends the classification system to perform tracking using statistical bootstrapping for one-shot learning and demonstrates the scale and rotation equivariance of DART. The proposed approach yielded an average overlap success (OS) of 0.6242 in scenarios involving translational motion.
Detection: A long-term tracking framework was introduced, designed to reinitialize the tracker upon loss of the object. This framework, comprising a local search tracker and a global search detector, addresses the challenge of re-detection using cluster majority voting.
Feature Matching: DART simplifies addressing the feature correspondence problem, particularly beneficial for recognizing scenes spanning temporally distant frames.

Theoretical and Practical Implications

From a theoretical perspective, the DART descriptor extends the utility of log-polar transformations to the domain of asynchronous event-driven data. The paper convincingly argues that leveraging biologically-inspired sampling for event-based sensors offers a promising direction for robust vision systems. Practically, the demonstrated real-time performance on commercial hardware highlights DART's potential in applications requiring low-latency processing.

The implications for autonomous robotics are noteworthy. The tested integration on UAVs showcases DART's potential role in complex navigation systems where real-time processing and adaptive tracking are critical.

Future Directions

The paper opens several avenues for future work. One potential direction involves improving the resilience of DART in environments with significant background clutter or noise—a common challenge in real-world deployments. Furthermore, the authors acknowledge the opportunity to refine online training mechanisms for the detector to mitigate drift and enhance re-detection robustness. Given the emerging interest in neuromorphic computing, incorporating DART within such architectures could lead to more energy-efficient implementations, a key consideration for mobile applications.

In sum, the DART descriptor stands as a shaping force in the field of event-based vision, furnishing a robust tool for dynamic perception tasks in compact and computationally efficient formats. With ongoing advancements in this domain, the integration of DART into broader vision frameworks will likely stimulate further research and application developments.

PDF Markdown

Related Papers

YouTube

Show All Videos