Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection (1912.02424v4)

Published 5 Dec 2019 in cs.CV

Abstract: Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to $50.7\%$ AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS

Authors (5)

Shifeng Zhang (46 papers)
Cheng Chi (41 papers)
Yongqiang Yao (21 papers)
Zhen Lei (205 papers)
Stan Z. Li (222 papers)

Citations (1,376)

View on Semantic Scholar

Summary

The paper's main contribution is ATSS, which dynamically selects positive and negative training samples based on object statistics.
It introduces a novel dynamic IoU threshold that adapts to candidate sample characteristics, reducing the need for extensive hyperparameter tuning.
Experimental results on MS COCO demonstrate significant AP improvements, with ATSS outperforming traditional anchor-based and anchor-free detectors.

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

The paper "Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection" addresses a fundamental question in the domain of object detection: How can we effectively define positive and negative training samples? Traditional methods of object detection have long been dominated by anchor-based detectors, whereas recent advancements have led to the rise of anchor-free detectors. This paper posits that the primary distinction between these methods is not their architecture but rather their method of defining positive and negative samples.

The paper introduces the Adaptive Training Sample Selection (ATSS) method, which dynamically and automatically determines positive and negative samples during training based on the statistical characteristics of objects, thereby bridging the performance gap between anchor-based and anchor-free detectors. Experimental evidence on the MS COCO benchmark demonstrates the efficacy of ATSS in producing state-of-the-art performance without additional computational overhead.

Key Insights

Core Difference Analysis: The paper rigorously examines the primary differences between anchor-based (e.g., RetinaNet) and anchor-free detectors (e.g., FCOS). It concludes that the prominent discrepancy lies in how positive and negative samples are defined rather than the finer aspects of the detection architecture. While anchor-based methods rely on Intersection over Union (IoU) thresholds to define these samples, anchor-free methods use spatial and scale constraints, which have proven to be more effective.
Adaptive Training Sample Selection (ATSS): The ATSS method was proposed as a novel approach to select training samples adaptively:

Candidate Selection Based on Center Distance:

Each object's candidate positive samples are chosen based on the proximity of anchor centers to the object's center.
Dynamic IoU Threshold:

The IoU threshold for selecting positives is dynamically calculated as the sum of the mean and standard deviation of the IoUs of these candidates, adapting to the object's specific characteristics.
Positives Limited to Ground-Truth Box:

Only candidate samples within the ground-truth box are considered, enhancing the reliability of sample selection.

Robustness and Hyperparameter Sensitivity: ATSS is designed to be nearly hyperparameter-free, significantly reducing the need for fine-tuning. The single hyperparameter, $k$ , which governs the number of candidates per pyramid level, was shown to be robust across a broad range of values.

Experimental Results

The implementation of ATSS yielded substantial improvements in object detection performance:

Performance Metrics:

Utilizing ATSS resulted in a significant performance boost. For instance, integrating ATSS into an enhanced RetinaNet (#A=1) raised the AP from 37.0% to 39.3%.

Comparative Advantage:

The ATSS-augmented RetinaNet (#A=1) was shown to match or exceed the performance of traditional multi-anchor setups without the complexity of handling multiple anchors per location, thus adding interpretability and simplicity.

Anchor-Free Detector Integration:

When applied to FCOS, the ATSS-based approach not only retained the benefits of anchor-free detection but also surpassed the baseline performance, achieving 39.2% AP with an enhanced version.

Implications

The proposed ATSS method has broad implications for both practical and theoretical advancements in object detection:

Theoretical Clarity:

This research illuminates the critical impact of sample selection on detection performance, suggesting a reevaluation of long-standing practices such as tiling multiple anchors.

Practical Utility:

The implementation of ATSS in commercial systems can lead to higher detection rates without additional computational burdens, facilitating real-time applications like surveillance and automated driving.

Future Developments

Potential avenues for future research include:

Extensions to Other Architectures: While this paper focuses on RetinaNet and FCOS, further testing across diverse detection paradigms could cement ATSS as a universal mechanism for various neural architectures.
Enhancements with Semi-Supervised Learning: Exploring ATSS's compatibility with semi-supervised or unsupervised learning approaches might yield benefits in scenarios with limited annotated data.
Integration with Transformer-Based Detectors: The new wave of transformer-based detectors presents an intriguing domain for the application of ATSS, potentially enabling more refined attention mechanisms in sample selection.

In conclusion, "Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection" proposes a well-substantiated and innovative method for improving object detection. By focusing on the core issue of training sample selection, the paper not only enhances detection performance but also paves the way for more nuanced and adaptable methodologies in the field of computer vision.

PDF Markdown

Related Papers

GitHub

GitHub - sfzhang15/ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection, CVPR, Oral, 2020 (1,075 stars)

YouTube

Show All Videos