FoveaBox: Beyond Anchor-based Object Detector (1904.03797v2)

Published 8 Apr 2019 in cs.CV

Abstract: We present FoveaBox, an accurate, flexible, and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations. In FoveaBox, an instance is assigned to adjacent feature levels to make the model more accurate.We demonstrate its effectiveness on standard benchmarks and report extensive experimental analysis. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance on the standard COCO and Pascal VOC object detection benchmark. More importantly, FoveaBox avoids all computation and hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. We believe the simple and effective approach will serve as a solid baseline and help ease future research for object detection. The code has been made publicly available at https://github.com/taokong/FoveaBox .

View on arXiv

Authors (6)

Tao Kong (49 papers)
Fuchun Sun (127 papers)
Huaping Liu (97 papers)
Yuning Jiang (106 papers)
Lei Li (1293 papers)
Jianbo Shi (57 papers)

Citations (203)

View on Semantic Scholar

Summary

An Analysis of FoveaBox: Advancements in Anchor-Free Object Detection

The paper "FoveaBox: Beyond Anchor-Based Object Detection" presents a novel approach to object detection, eschewing the traditional use of pre-defined anchors in favor of a fully anchor-free framework. This approach positions itself against the dominant paradigm in object detection—anchor-based methods—and offers a potentially simpler and more flexible solution. This essay provides an expert's analysis of the FoveaBox framework, its underlying mechanisms, and its implications on future research in object detection.

The anchor-based object detection framework relies heavily on the design and placement of anchors to predict bounding boxes for objects. These anchors act as references for algorithms to predict object location and are integral to popular models like Faster R-CNN and RetinaNet. However, anchors come with a host of limitations, including the need for careful design specific to dataset distributions, poor generalization across varied applications, and increased computational costs. The FoveaBox framework addresses these issues by removing reliance on anchors entirely.

FoveaBox's methodology involves predicting category-sensitive semantic maps to ascertain the existence of objects and generating category-agnostic bounding boxes for each potential object position. The framework utilizes feature pyramid representations, akin to those in FPN architectures, to handle different object scales robustly. In training, FoveaBox shifts from generating anchor-based predictions to directly leveraging ground-truth bounding boxes for prediction, bypassing the computational complexity typical of anchor techniques.

The experimental results reported in the paper are noteworthy. FoveaBox achieves superior performance over I anchor-based systems on standard benchmarks, such as COCO and Pascal VOC, securing a 2.2 AP gain over RetinaNet without dependency on anchor-related hyper-parameters. It also provides substantial evidence on its improved speed and computational efficiency.

A significant practical implication of the FoveaBox framework is its reduction of hyper-parameter tuning associated with anchors, which simplifies model design and training. Theoretically, it signifies a shift towards more biologically inspired models that mimic human visual systems' object detection capabilities without pre-defined templates.

Several properties contribute to FoveaBox's robust performance. By assigning objects to multiple feature pyramid levels instead of a singular one, it demonstrates increased robustness and improved metrics across various aspect ratios of detected objects. The experimental evaluation further underlines its capability in producing high-quality region proposals, surpassing those generated by traditional RPN methods by approximately 8.4 points in AR metrics.

A critical assessment of the fundamental advantages of FoveaBox includes the elimination of the ambiguity inherent in defining positive and negative anchor samples, fostering straightforward optimization objectives. Moreover, the reduced output space resulting from its anchor-free prediction model contributes to its computational expediency.

Looking forward, the conceptual clarity and empirical strengths of FoveaBox suggest its potential to serve as a solid baseline for developing advanced object detection frameworks. Its flexible yet potent approach aligns well with research striving for efficiency and simplicity, and it may inspire further exploration into biologically motivated computer vision models. As object detection continues to evolve, frameworks such as FoveaBox offer promising pathways toward achieving real-time and adaptive visual systems.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - taokong/FoveaBox: FoveaBox: Beyond Anchor-based Object Detector (367 stars)