LEOD: Label-Efficient Object Detection for Event Cameras (2311.17286v2)

Published 29 Nov 2023 in cs.CV

Abstract: Object detection with event cameras benefits from the sensor's low latency and high dynamic range. However, it is costly to fully label event streams for supervised training due to their high temporal resolution. To reduce this cost, we present LEOD, the first method for label-efficient event-based detection. Our approach unifies weakly- and semi-supervised object detection with a self-training mechanism. We first utilize a detector pre-trained on limited labels to produce pseudo ground truth on unlabeled events. Then, the detector is re-trained with both real and generated labels. Leveraging the temporal consistency of events, we run bi-directional inference and apply tracking-based post-processing to enhance the quality of pseudo labels. To stabilize training against label noise, we further design a soft anchor assignment strategy. We introduce new experimental protocols to evaluate the task of label-efficient event-based detection on Gen1 and 1Mpx datasets. LEOD consistently outperforms supervised baselines across various labeling ratios. For example, on Gen1, it improves mAP by 8.6% and 7.8% for RVT-S trained with 1% and 2% labels. On 1Mpx, RVT-S with 10% labels even surpasses its fully-supervised counterpart using 100% labels. LEOD maintains its effectiveness even when all labeled data are available, reaching new state-of-the-art results. Finally, we show that our method readily scales to improve larger detectors as well. Code is released at https://github.com/Wuziyi616/LEOD

PDF Abstract

An Expert Evaluation of "LEOD: Label-Efficient Object Detection for Event Cameras"

The paper introduced in "LEOD: Label-Efficient Object Detection for Event Cameras" marks a significant development in the field of computer vision, particularly in its focus on object detection using event cameras. Event cameras hold intrinsic advantages due to their low latency and high dynamic range, making them suitable for safety-critical applications such as autonomous driving. However, these benefits are mitigated by the high cost of labeling event streams for supervised learning, owing to their high temporal resolution. This paper addresses the challenge by developing LEOD, a novel approach to achieving label-efficient event-based detection.

LEOD stands as the first method to unify weakly- and semi-supervised object detection via a self-training mechanism. This framework is distinguished by its ability to integrate both labeled and unlabeled data into the training pipeline, thereby optimizing the labeling cost associated with event streams. The approach initiates with a detector pre-trained on limited labeled data, generating pseudo ground truths on the unlabeled events. This pseudo-labeled data, alongside real labeled data, is subsequently utilized to re-train the detector.

Several methodological novelties underpin LEOD's enhanced performance. Utilizing the temporal consistency characteristic of events, LEOD employs bi-directional inference and tracking-based post-processing, which significantly enhances the quality of pseudo labels by ensuring temporal consistency of labeled objects. This is critical for filtering out temporal inconsistencies which are inherently prevalent in naive pseudo label generation. To further bolster the defense against label noise, a soft anchor assignment strategy is introduced. This aspect of the method emphasizes reliable background and foreground label training by gracefully handling the potential noise within the pseudo labels.

Experimental validation demonstrates LEOD's robustness across various settings. The experiments applied to the Gen1 and 1Mpx datasets are revealing; they showcase a consistent outperformance of traditional supervised baselines across varying labeling ratios. For instance, on the Gen1 dataset, LEOD improves the mAP for RVT-S by 8.6% and 7.8% when trained with 1% and 2% of labels, and on the 1Mpx dataset, LEOD demonstrates the ability to surpass fully-supervised models even when trained with only 10% labeled data. Such results elucidate the efficacy of LEOD in enabling efficient use of limited labeled data.

Moreover, LEOD retains its effectiveness and even establishes new state-of-the-art benchmarks when all the available labeled data is utilized. This scalability implies that the approach is not only beneficial in low-label regimes but also enhances performance in fully supervised settings. This aspect significantly broadens the potential use cases of LEOD, making it applicable across a spectrum of real-world scenarios with varying degrees of annotated data availability.

The theoretical implications of this research are profound. It suggests a realignment of traditional approaches to supervised learning in the context of high-dimensional event camera data. The fusion of weakly- and semi-supervised methodologies into a cohesive framework reflects a broader trend towards more flexible, data-efficient learning paradigms in computer vision. Practically, LEOD promises cost-efficient deployment in fields where rapid, robust object detection is necessary, enhancing both the feasibility and performance of systems equipped with event-based cameras.

Looking forward, the extension of LEOD to more generalized detection tasks and the exploration of its adaptability across different event camera hardware or varying environmental conditions could be promising research avenues. Additionally, the framework lays essential groundwork for exploring label efficiency in other real-world sensor modalities where full annotation is challenging. As a whole, LEOD contributes a significant piece to the ongoing progression towards more versatile and resource-efficient AI systems.