AugFPN: Improving Multi-scale Feature Learning for Object Detection (1912.05384v1)

Published 11 Dec 2019 in cs.CV

Abstract: Current state-of-the-art detectors typically exploit feature pyramid to detect objects at different scales. Among them, FPN is one of the representative works that build a feature pyramid by multi-scale features summation. However, the design defects behind prevent the multi-scale features from being fully exploited. In this paper, we begin by first analyzing the design defects of feature pyramid in FPN, and then introduce a new feature pyramid architecture named AugFPN to address these problems. Specifically, AugFPN consists of three components: Consistent Supervision, Residual Feature Augmentation, and Soft RoI Selection. AugFPN narrows the semantic gaps between features of different scales before feature fusion through Consistent Supervision. In feature fusion, ratio-invariant context information is extracted by Residual Feature Augmentation to reduce the information loss of feature map at the highest pyramid level. Finally, Soft RoI Selection is employed to learn a better RoI feature adaptively after feature fusion. By replacing FPN with AugFPN in Faster R-CNN, our models achieve 2.3 and 1.6 points higher Average Precision (AP) when using ResNet50 and MobileNet-v2 as backbone respectively. Furthermore, AugFPN improves RetinaNet by 1.6 points AP and FCOS by 0.9 points AP when using ResNet50 as backbone. Codes will be made available.

PDF Abstract

Analyzing AugFPN: Enhancements in Multi-scale Feature Learning for Object Detection

The paper "AugFPN: Improving Multi-scale Feature Learning for Object Detection" introduces an augmented feature pyramid network (FPN) designed to address specific limitations observed in traditional FPNs used in object detection. This innovative architecture, termed Augmented FPN (AugFPN), aims to enhance the efficacy of multi-scale feature utilization, a critical aspect for detecting objects of varied sizes in images.

Key Contributions of AugFPN

The AugFPN framework is built upon three core components: Consistent Supervision, Residual Feature Augmentation, and Soft Region of Interest (RoI) Selection. Each component addresses distinct challenges associated with the multi-scale feature fusion process in conventional FPNs.

Consistent Supervision: AugFPN incorporates a strategy of Consistent Supervision to mitigate the semantic gaps between features of different scales prior to fusion. The paper outlines that enforcing uniform supervisory signals across features at all levels fosters semantic consistency, enhancing the multi-scale representation's robustness.
Residual Feature Augmentation: This component focuses on enriching the feature map at the highest pyramid level, which traditionally suffers from information loss during channel reduction. By extracting ratio-invariant context information using a residual augmentation path, this technique supplements the feature's original spatial context, significantly reducing semantic loss during feature aggregation.
Soft RoI Selection: Traditional RoI selection methods in FPNs rely heavily on heuristic-based assignments, which can overlook potential benefits of features across different pyramid levels. AugFPN introduces Soft RoI Selection, which leverages adaptive spatial fusion, allowing dynamic weights to guide the amalgamation of features across all levels, thus ensuring comprehensive usage of multi-layer information for improved detection accuracy.

Performance and Results

Empirical evaluations on the MS COCO dataset present AugFPN as a consistently superior framework compared to baseline FPN implementations. Specifically, replacement of FPN with AugFPN within Faster R-CNN models yielded Average Precision (AP) improvements of 2.3 points with ResNet50 and 1.6 points with MobileNet-v2. These substantial gains underscore the robustness and adaptability of AugFPN across various backbones, including enhanced performance in both one-stage and two-stage detectors like RetinaNet and FCOS.

Implications and Future Prospects

The paper's advancements in object detection through AugFPN offer tangible benefits for practical applications that require accurate multi-scale object detection, as seen in fields ranging from autonomous driving to medical imaging. Theoretical implications extend towards more generalizable algorithms in high-dimensional feature aggregation and adaptive supervision strategies that could influence the architecture of future deep-learning-based detectors.

AugFPN's ability to effectively address long-standing challenges in multi-scale feature learning within object detection is indicative of the potential for future research. Subsequent explorations might involve optimizing the computational efficiency further, investigating scalability across diverse tasks, and integrating the approach with cutting-edge deep learning frameworks.

In conclusion, AugFPN represents a noteworthy incremental contribution to the domain of object detection, achieving significant advancements in feature pyramid utilization and setting a precedent for multi-scale feature learning methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Chaoxu Guo (8 papers)
Bin Fan (40 papers)
Qian Zhang (308 papers)
Shiming Xiang (54 papers)
Chunhong Pan (33 papers)

Citations (317)

View on Semantic Scholar

AugFPN: Improving Multi-scale Feature Learning for Object Detection (1912.05384v1)

Analyzing AugFPN: Enhancements in Multi-scale Feature Learning for Object Detection

Key Contributions of AugFPN

Performance and Results

Implications and Future Prospects

Related Papers