AFPN: Asymptotic Feature Pyramid Network for Object Detection (2306.15988v2)

Published 28 Jun 2023 in cs.CV

Abstract: Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}.

PDF HTML Abstract

AFPN: Asymptotic Feature Pyramid Network for Object Detection

The paper authored by Guoyu Yang et al., "AFPN: Asymptotic Feature Pyramid Network for Object Detection," introduces an innovative enhancement to existing feature pyramid networks with the aim of improving object detection performance across multiple scales. The key advancement presented in this work is the Asymptotic Feature Pyramid Network (AFPN), designed to facilitate direct interaction between non-adjacent feature levels without encountering the semantic gaps that typically impair current methods.

Summary of Methodology

The AFPN builds upon multi-scale feature extraction strategies traditionally used in object detection, such as Feature Pyramid Network (FPN) and its derivatives like Path Aggregation Network (PANet). While these prior architectures aim to integrate low-level and high-level feature details to tackle scale variance, they often suffer from feature information loss due to propagation across multiple intermediate layers.

AFPN addresses this issue by employing an asymptotic fusion strategy that gradually integrates hierarchical feature layers. The process initiates with the fusion of adjacent low-level features, subsequently and progressively amalgamating higher-level features in an ascending order until reaching the top-level features. This strategy mitigates the semantic gap challenge inherent in non-adjacent level fusion, thereby preserving essential feature details and semantics more effectively. Additionally, the AFPN leverages an adaptive spatial fusion operation that intelligently filters spatial features to resolve potential conflicts arising from multi-object information at identical spatial locations.

The AFPN framework is compatible with both two-stage and one-stage object detection models. Specifically, it has been evaluated using standard architectures like Faster R-CNN and YOLOv5, demonstrating notable performance improvements over traditional FPN architectures.

Experimental Evaluation

Experiments conducted on the MS COCO 2017 dataset exhibit AFPN's competitiveness against other state-of-the-art feature pyramids. Noteworthy findings include:

On a Faster R-CNN framework, AFPN achieves 39.0% AP on the $640 \times 640$ resolution, outperforming the traditional FPN by 1.6% AP.
Further analysis on the two-stage detector with ResNet-101 on the MS COCO test-dev dataset reveals a 2.6% AP improvement over FPN, clearly demonstrating AFPN's enhanced detection capabilities, particularly for large objects.
In one-stage detection, the AFPN-integrated YOLOv5 framework presents superior performance using fewer parameters, underscoring its efficiency.

Implications and Future Directions

From a practical perspective, AFPN's architectural novelties, like the hierarchical asymptotic integration and adaptive spatial fusion, can be pivotal for real-world applications where detecting objects across varying scales with robustness and accuracy is essential. The reduced computational cost and parameter efficiency further advocate for its deployment in environments with constrained resources.

On a theoretical plane, AFPN stimulates further discussion on the effective management of semantic gaps in feature pyramid networks. It opens avenues for exploring alternative fusion strategies and adaptive mechanisms that could further optimize multi-scale object detection frameworks. Future work could aim to refine AFPN for even lighter architectures or integrate it into other visual tasks beyond object detection, broadening its applicability in the domain of computer vision.

In conclusion, AFPN presents a refined approach to handling multi-scale features in object detection, with significant evidence backing its efficacy and computational efficiency. This paper contributes meaningfully to the ongoing exploration of robust object detection methodologies, offering advancements that enrich both practical deployments and theoretical frameworks in the field.

PDF Markdown Bookmark Chat (Pro)

References (27)

Authors (6)

Guoyu Yang (4 papers)
Jie Lei (52 papers)
Zhikuan Zhu (1 paper)
Siyu Cheng (10 papers)
Zunlei Feng (58 papers)
Ronghua Liang (19 papers)

Citations (95)

View on Semantic Scholar

GitHub

GitHub - gyyang23/AFPN (133 stars)

AFPN: Asymptotic Feature Pyramid Network for Object Detection (2306.15988v2)