Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FCOS: A simple and strong anchor-free object detector (2006.09214v3)

Published 14 Jun 2020 in cs.CV

Abstract: In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications. Recently one-stage methods have gained much attention over two-stage approaches due to their simpler design and competitive performance. Here we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code and pre-trained models are available at: https://git.io/AdelaiDet

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhi Tian (68 papers)
  2. Chunhua Shen (404 papers)
  3. Hao Chen (1006 papers)
  4. Tong He (124 papers)
Citations (480)

Summary

Overview of FCOS: A Simple and Strong Anchor-free Object Detector

The field of object detection in computer vision has seen a spectrum of methodologies, from the earlier two-stage detectors like Faster R-CNN to more recent one-stage frameworks such as YOLOv3 and RetinaNet. The paper "FCOS: A Simple and Strong Anchor-free Object Detector" by Zhi Tian et al. introduces a fully convolutional one-stage object detector designed to address specific challenges associated with these approaches.

Key Contributions

FCOS (Fully Convolutional One-Stage Object Detector) diverges from conventional anchor-based models by eliminating the need for pre-defined anchor boxes. These anchor boxes have traditionally been a cornerstone of object detection but necessitate significant computational resources and introduce numerous hyper-parameters that can substantially impact the detection performance. Unlike object detectors that rely on anchor boxes, FCOS predicts object bounding boxes directly for each pixel, solving the problem in a manner akin to semantic segmentation tasks.

Major contributions include:

  1. Anchor-free Framework: FCOS removes the need for anchor boxes, thereby simplifying the training process and reducing the number of hyper-parameters. This eliminates the computational overhead associated with calculating intersection-over-union (IoU) scores with anchor boxes.
  2. Use of Center-ness: The detector includes a novel concept termed "center-ness" branch, which helps improve the detection quality by predicting the likelihood of a pixel being close to the center of any object. This mechanism helps filter out low-quality bounding boxes during non-maximum suppression (NMS), refining the final detections.
  3. Multi-level Prediction: Leveraging the feature pyramid networks (FPN), FCOS assigns objects of different sizes to appropriate feature levels, which mitigates the ambiguity resulting from overlapping bounding boxes.
  4. High Accuracy: With the proposed architecture, FCOS demonstrates superior detection performance, achieving an Average Precision (AP) of 44.8% with a ResNeXt-64x4d-101-FPN backbone, surpassing several anchor-based models.

Implications and Future Directions

This paper challenges the necessity of anchor boxes, which are deeply entrenched in object detection frameworks, highlighting that it is possible to achieve and even exceed existing benchmarks without them. This approach aligns object detection more closely with other dense prediction tasks in computer vision, opening the door to easier integration and cross-pollination of ideas across these tasks.

For future applications, the adaptability of FCOS suggests potential extensions to instance segmentation, keypoint detection, and other tasks that deal with localized information in images. As such, FCOS could encourage the development of more unified systems that handle various instance-level recognition tasks under a single framework.

The simplicity and effectiveness of the FCOS architecture make it compelling for further exploration in academia and industry, especially in applications that demand quick deployment and adaptation with minimal parameter tuning.

Conclusion

In conclusion, FCOS represents an advancement in the field of object detection by removing the long-standing dependency on anchor boxes, simplifying the design and implementation of object detectors. Its impressive results, coupled with its simplicity, pose significant implications for both theoretical research and practical deployment in object detection and related applications. As the community continues to explore anchor-free methodologies, FCOS stands out as a promising direction for future research and development in computer vision.