VarifocalNet: An IoU-aware Dense Object Detector (2008.13367v2)

Published 31 Aug 2020 in cs.CV

Abstract: Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .

Authors (4)

Haoyang Zhang (28 papers)
Ying Wang (366 papers)
Feras Dayoub (58 papers)
Niko Sünderhauf (55 papers)

Citations (572)

View on Semantic Scholar

Summary

VarifocalNet: An IoU-aware Dense Object Detector

The paper presents VarifocalNet (VFNet), a novel approach for dense object detection that introduces the concept of IoU-aware Classification Score (IACS) to improve the ranking of detection candidates. This approach addresses the inherent misalignment between classification confidence and localization accuracy found in previous object detection methods.

Key Components

The VFNet architecture is built on the FCOS+ATSS framework, with the introduction of several innovative components:

IoU-aware Classification Score (IACS): Instead of the traditional classification score, VFNet uses IACS to consolidate both the object presence confidence and bounding box localization accuracy. This score more reliably ranks detection candidates, which enhances the overall detection performance.
Varifocal Loss: Inspired by focal loss, this novel loss function emphasizes high-quality detections by assigning more weight to accurately localized bounding boxes during training. It provides a dynamic scaling mechanism based on the predicted IACS, efficiently handling class imbalance by focusing on difficult examples.
Star-shaped Bounding Box Representation: VFNet uses features from nine predefined sampling points to represent bounding boxes, capturing both geometric and contextual information essential for accurate IACS prediction and bounding box refinement.
Bounding Box Refinement: By incorporating a refinement step that leverages the star-shaped feature representation, VFNet further enhances the precision of bounding box localization.

Experimental Evaluation

The VFNet is evaluated extensively on the MS COCO dataset, demonstrating a consistent improvement over previous baseline models, such as FCOS+ATSS. Specific configurations achieve an approximately 2.0 AP gain, showing the effectiveness of the proposed method. The best model configuration of VFNet, VFNet-X-1200 with Res2Net-101-DCN, achieves an AP of 55.1 on the COCO test-dev set, setting a new benchmark in object detection.

Implications and Future Directions

The strong performance of VFNet suggests significant practical implications for object detection applications where accurate localization is critical. The integration of IACS into the detection pipeline highlights the utility of approaches that prioritize unified scores combining confidence and accuracy. The architectural modifications introduced by VFNet can serve as a foundation for further advancements in dense object detection frameworks. Future research could explore the application of similar IoU-aware mechanisms in other computer vision tasks and enhance scalability by optimizing the computational efficiency of the proposed features.

In conclusion, VarifocalNet represents a substantial step forward in object detection research by addressing the challenges of detection ranking and bounding box accuracy. Its contributions not only advance the theoretical understanding of IoU-aware models but also provide a robust framework for practical applications in real-world scenarios.

PDF Markdown

Related Papers

Find Related Papers