Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistency-based Active Learning for Object Detection (2103.10374v3)

Published 18 Mar 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Active learning aims to improve the performance of task model by selecting the most informative samples with a limited budget. Unlike most recent works that focused on applying active learning for image classification, we propose an effective Consistency-based Active Learning method for object Detection (CALD), which fully explores the consistency between original and augmented data. CALD has three appealing benefits. (i) CALD is systematically designed by investigating the weaknesses of existing active learning methods, which do not take the unique challenges of object detection into account. (ii) CALD unifies box regression and classification with a single metric, which is not concerned by active learning methods for classification. CALD also focuses on the most informative local region rather than the whole image, which is beneficial for object detection. (iii) CALD not only gauges individual information for sample selection, but also leverages mutual information to encourage a balanced data distribution. Extensive experiments show that CALD significantly outperforms existing state-of-the-art task-agnostic and detection-specific active learning methods on general object detection datasets. Based on the Faster R-CNN detector, CALD consistently surpasses the baseline method (random selection) by 2.9/2.8/0.8 mAP on average on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO. Code is available at \url{https://github.com/we1pingyu/CALD}

Citations (44)

Summary

  • The paper introduces a novel consistency-based active learning framework that integrates IoU and JS divergence metrics to assess predictions on original and augmented data.
  • It demonstrates significant mAP improvements—up to 2.9%—over random selection on benchmarks like PASCAL VOC and MS COCO.
  • The method balances sample selection via mutual information, effectively mitigating class imbalance and enhancing annotation efficiency.

Consistency-based Active Learning for Object Detection: A Comprehensive Analysis

Object detection, a fundamental task within computer vision, presents unique challenges due to the necessity for annotating both bounding boxes and class labels. Existing active learning methods primarily focus on image classification, failing to address these challenges effectively. The paper "Consistency-based Active Learning for Object Detection (CALD)" proposes an innovative approach to overcome these limitations, aiming to improve annotation efficiency for object detection without compromising detection accuracy.

Consistency-based Active Learning Methodology

CALD differentiates itself by incorporating a consistency-based active learning framework specifically designed for object detection. Unlike classification-focused strategies, CALD assesses the internal consistency between predictions on original images and their augmented versions, providing a robust metric that encompasses both classification and regression information. This method is pivotal in addressing the distinctive demands of object detection tasks.

CALD's framework is delineated into two primary stages:

  1. Consistency-based Individual Information Stage: This stage focuses on gauging the consistency of predictions from both original and augmented data. It emphasizes local regions rather than global image metrics, making it uniquely suited for object detection where informative patches are often localized. Here, the method calculates consistency using Intersection over Union (IoU) for bounding box predictions and Jensen-Shannon Divergence (JS) for class scores.
  2. Mutual Information Stage: This subsequent stage utilizes mutual information to ensure balanced class distribution across selected samples. By considering the JS divergence between the class distributions of potential selected samples and the existing labeled pool, CALD strategically selects samples to avoid class imbalance, thus optimizing data utility for model training.

Empirical Evaluation and Results

The efficacy of CALD is substantiated through comprehensive experiments on established benchmarks such as PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO, employing both Faster R-CNN and RetinaNet detectors. Notably, CALD outperforms existing state-of-the-art active learning methods, achieving noteworthy improvements in mean Average Precision (mAP) across all datasets. Specifically, CALD enhances performance by margins of 2.9%, 2.8%, and 0.8 mAP compared to random selection on the aforementioned benchmarks.

A significant observation is CALD's strength in fundamental categories with initially lower Average Precision (AP), demonstrating its aptitude for refining challenging detection scenarios. Additionally, CALD's performance is consistent across varied data augmentations and base point settings for consistency measurements, further validating its underlying methodological robustness.

Implications and Future Directions

By bridging the gap between classification and detection tasks, CALD offers a promising direction for active learning in object detection, addressing limitations that historically hindered classification-based methodologies from excelling when transferred to detection tasks. The implications of CALD are manifold:

  • Practical Benefits: CALD propositions an efficient strategy for budget-constrained annotation in real-world applications, leveraging unlabeled data more effectively while ensuring balanced class representations in training datasets.
  • Theoretical Advancements: The unification of classification and regression metrics within a single framework showcases the potential to refine active learning paradigms, possibly extending similar methodologies to other tasks requiring joint metric evaluations.

Looking ahead, the exploration of CALD's adaptation to alternative object detection architectures and its integration with semi-supervised learning methods could further enhance its applicability and effectiveness. Additionally, investigating the scalability of CALD in large-scale deployments may offer insights into optimizing computational resources and annotation efforts.

In summary, CALD represents a significant step forward in active learning for object detection, offering a refined approach that holistically evaluates and selects the most informative samples for annotation, thereby enhancing model performance in data-scarce environments.