Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature Selective Anchor-Free Module for Single-Shot Object Detection (1903.00621v1)

Published 2 Mar 2019 in cs.CV

Abstract: We motivate and present feature selective anchor-free (FSAF) module, a simple and effective building block for single-shot object detectors. It can be plugged into single-shot detectors with feature pyramid structure. The FSAF module addresses two limitations brought up by the conventional anchor-based detection: 1) heuristic-guided feature selection; 2) overlap-based anchor sampling. The general concept of the FSAF module is online feature selection applied to the training of multi-level anchor-free branches. Specifically, an anchor-free branch is attached to each level of the feature pyramid, allowing box encoding and decoding in the anchor-free manner at an arbitrary level. During training, we dynamically assign each instance to the most suitable feature level. At the time of inference, the FSAF module can work jointly with anchor-based branches by outputting predictions in parallel. We instantiate this concept with simple implementations of anchor-free branches and online feature selection strategy. Experimental results on the COCO detection track show that our FSAF module performs better than anchor-based counterparts while being faster. When working jointly with anchor-based branches, the FSAF module robustly improves the baseline RetinaNet by a large margin under various settings, while introducing nearly free inference overhead. And the resulting best model can achieve a state-of-the-art 44.6% mAP, outperforming all existing single-shot detectors on COCO.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chenchen Zhu (26 papers)
  2. Yihui He (25 papers)
  3. Marios Savvides (61 papers)
Citations (772)

Summary

  • The paper presents a novel online feature selection mechanism that assigns object instances to optimal pyramid levels for anchor-free detection.
  • It augments single-shot detectors with additional convolution layers for parallel anchor-free classification and regression, boosting overall accuracy.
  • Experiments on COCO demonstrate a state-of-the-art 44.6% mAP with only a 6ms latency increase, proving robust performance across various backbones.

Feature Selective Anchor-Free Module for Single-Shot Object Detection

The paper "Feature Selective Anchor-Free Module for Single-Shot Object Detection," authored by Chenchen Zhu, Yihui He, and Marios Savvides, introduces a novel approach to object detection that aims to address specific limitations of anchor-based methods. The proposed method, the Feature Selective Anchor-Free (FSAF) module, represents a significant advancement in the field of single-shot object detection by offering improved performance and efficiency.

Overview

The primary objective of this paper is to mitigate the constraints imposed by heuristic-guided feature selection and overlap-based anchor sampling, which are inherent in traditional anchor-based detection frameworks. The FSAF module is designed to be integrated into single-shot detectors with feature pyramid structures. By applying online feature selection during training, the FSAF module dynamically assigns each object instance to the most suitable pyramid level. This approach allows for anchor-free box encoding and decoding at arbitrary feature levels, enhancing the network’s ability to detect objects of varying scales without being restricted by predefined anchor boxes.

Methodology

Network Architecture

The FSAF module introduces two additional convolution layers per feature pyramid level of the base single-shot detector. Specifically, these layers are responsible for anchor-free classification and regression, running parallel to the anchor-based branches. This design ensures the system can leverage both anchor-based and anchor-free predictions for improved detection accuracy.

Ground-Truth Generation and Loss Calculation

For anchor-free branches, the authors define ground-truth supervision signals through an effective box and an ignoring box derived from object instance sizes. The classification output employs focal loss, while the box regression output uses IoU loss for optimization. This dual-loss strategy helps the network to learn better representations for object detection.

Online Feature Selection

A key innovation in the FSAF module is its online feature selection mechanism. Instead of relying on heuristic rules based on object sizes, the feature selection process dynamically evaluates which feature level best represents an instance by minimizing a combined loss function (classification and regression). This results in more effective learning and ultimately improves detection performance across various object scales.

Joint Inference and Training

During inference, the FSAF module integrates seamlessly with the traditional anchor-based RetinaNet, contributing predictions from both approaches. The system remains efficient, as the extra computational overhead introduced by the FSAF module is minimal. This allows for high-quality detections without compromising on speed.

Experimental Results

Extensive experiments on the COCO detection benchmark demonstrate the efficacy of the FSAF module. Notable findings include:

  • Performance: The FSAF module, when combined with the anchor-based RetinaNet, achieves a state-of-the-art 44.6% mAP on COCO test-dev, outperforming all existing single-shot detectors.
  • Speed: The inference overhead introduced by the FSAF module is negligible, with only a 6ms increase in latency for significant performance gains.
  • Robustness: The method is tested across various backbone networks (ResNet-50, ResNet-101, ResNeXt-101), showing consistent improvements, thus demonstrating its robustness and generalizability.

Implications and Future Directions

The introduction of the FSAF module has both practical and theoretical implications:

  1. Practical Applications: By improving the detection accuracy and efficiency of single-shot detectors, the FSAF module can benefit a wide range of real-world applications, including autonomous driving, video surveillance, and augmented reality.
  2. Theoretical Advancements: This work opens up new avenues for research in object detection by challenging the reliance on anchors and encouraging the development of more adaptive and flexible feature selection mechanisms.

Future developments might include exploring more sophisticated online feature selection strategies, integrating the FSAF module with other advanced detection frameworks, and evaluating its performance on more diverse datasets. As the field of AI continues to evolve, approaches like the FSAF module contribute significantly by pushing the boundaries of what's achievable in object detection.

Conclusion

The "Feature Selective Anchor-Free Module for Single-Shot Object Detection" presents a substantial advancement in object detection methodology. By addressing the limitations of anchor-based methods and proposing an innovative feature selection mechanism, this work enhances the performance and efficiency of single-shot detectors. The experimental results underscore the practical benefits and the potential for future developments in this area, marking a noteworthy contribution to the field of computer vision.