Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AFPN: Asymptotic Feature Pyramid Network for Object Detection (2306.15988v2)

Published 28 Jun 2023 in cs.CV

Abstract: Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}.

AFPN: Asymptotic Feature Pyramid Network for Object Detection

The paper authored by Guoyu Yang et al., "AFPN: Asymptotic Feature Pyramid Network for Object Detection," introduces an innovative enhancement to existing feature pyramid networks with the aim of improving object detection performance across multiple scales. The key advancement presented in this work is the Asymptotic Feature Pyramid Network (AFPN), designed to facilitate direct interaction between non-adjacent feature levels without encountering the semantic gaps that typically impair current methods.

Summary of Methodology

The AFPN builds upon multi-scale feature extraction strategies traditionally used in object detection, such as Feature Pyramid Network (FPN) and its derivatives like Path Aggregation Network (PANet). While these prior architectures aim to integrate low-level and high-level feature details to tackle scale variance, they often suffer from feature information loss due to propagation across multiple intermediate layers.

AFPN addresses this issue by employing an asymptotic fusion strategy that gradually integrates hierarchical feature layers. The process initiates with the fusion of adjacent low-level features, subsequently and progressively amalgamating higher-level features in an ascending order until reaching the top-level features. This strategy mitigates the semantic gap challenge inherent in non-adjacent level fusion, thereby preserving essential feature details and semantics more effectively. Additionally, the AFPN leverages an adaptive spatial fusion operation that intelligently filters spatial features to resolve potential conflicts arising from multi-object information at identical spatial locations.

The AFPN framework is compatible with both two-stage and one-stage object detection models. Specifically, it has been evaluated using standard architectures like Faster R-CNN and YOLOv5, demonstrating notable performance improvements over traditional FPN architectures.

Experimental Evaluation

Experiments conducted on the MS COCO 2017 dataset exhibit AFPN's competitiveness against other state-of-the-art feature pyramids. Noteworthy findings include:

  • On a Faster R-CNN framework, AFPN achieves 39.0% AP on the 640×640640 \times 640 resolution, outperforming the traditional FPN by 1.6% AP.
  • Further analysis on the two-stage detector with ResNet-101 on the MS COCO test-dev dataset reveals a 2.6% AP improvement over FPN, clearly demonstrating AFPN's enhanced detection capabilities, particularly for large objects.
  • In one-stage detection, the AFPN-integrated YOLOv5 framework presents superior performance using fewer parameters, underscoring its efficiency.

Implications and Future Directions

From a practical perspective, AFPN's architectural novelties, like the hierarchical asymptotic integration and adaptive spatial fusion, can be pivotal for real-world applications where detecting objects across varying scales with robustness and accuracy is essential. The reduced computational cost and parameter efficiency further advocate for its deployment in environments with constrained resources.

On a theoretical plane, AFPN stimulates further discussion on the effective management of semantic gaps in feature pyramid networks. It opens avenues for exploring alternative fusion strategies and adaptive mechanisms that could further optimize multi-scale object detection frameworks. Future work could aim to refine AFPN for even lighter architectures or integrate it into other visual tasks beyond object detection, broadening its applicability in the domain of computer vision.

In conclusion, AFPN presents a refined approach to handling multi-scale features in object detection, with significant evidence backing its efficacy and computational efficiency. This paper contributes meaningfully to the ongoing exploration of robust object detection methodologies, offering advancements that enrich both practical deployments and theoretical frameworks in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
  2. G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon et al., “ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations,” Aug. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.7002879
  3. C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022.
  4. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  5. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
  6. Y. Wu, Y. Chen, L. Yuan, Z. Liu, L. Wang, H. Li, and Y. Fu, “Rethinking classification and localization for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 186–10 195.
  7. H. Zhang, H. Chang, B. Ma, N. Wang, and X. Chen, “Dynamic r-cnn: Towards high quality object detection via dynamic training,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16.   Springer, 2020, pp. 260–275.
  8. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  9. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759–8768.
  10. G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Nas-fpn: Learning scalable feature pyramid architecture for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7036–7045.
  11. C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “Augfpn: Improving multi-scale feature learning for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 595–12 604.
  12. D. Zhang, H. Zhang, J. Tang, M. Wang, X. Hua, and Q. Sun, “Feature pyramid transformer,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16.   Springer, 2020, pp. 323–339.
  13. G. Zhao, W. Ge, and Y. Yu, “Graphfpn: Graph feature pyramid network for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 2763–2772.
  14. Y. Quan, D. Zhang, L. Zhang, and J. Tang, “Centralized feature pyramid for object detection,” arXiv preprint arXiv:2210.02093, 2022.
  15. Q. Yang, T. Zhang, T. Qiu, Y. Xiao, and X. Jiang, “Double feature pyramid networks for classification and localization on object detection,” in 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC).   IEEE, 2022, pp. 1395–1400.
  16. A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6399–6408.
  17. S. Qiao, L.-C. Chen, and A. Yuille, “Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 213–10 224.
  18. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693–5703.
  19. J. Wang, K. Chen, R. Xu, Z. Liu, C. C. Loy, and D. Lin, “Carafe: Content-aware reassembly of features,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3007–3016.
  20. S. Liu, D. Huang, and Y. Wang, “Learning spatial fusion for single-shot object detection,” arXiv preprint arXiv:1911.09516, 2019.
  21. J. Ma and B. Chen, “Dual refinement feature pyramid networks for object detection,” arXiv preprint arXiv:2012.01733, 2020.
  22. J. Xie, Y. Pang, J. Nie, J. Cao, and J. Han, “Latent feature pyramid network for object detection,” IEEE Transactions on Multimedia, 2022.
  23. L. Zhu, F. Lee, J. Cai, H. Yu, and Q. Chen, “An improved feature pyramid network for object detection,” Neurocomputing, vol. 483, pp. 127–139, 2022.
  24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  25. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
  26. K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li et al., “MMDetection: Open mmlab detection toolbox and benchmark,” arXiv preprint arXiv:1906.07155, 2019.
  27. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Guoyu Yang (4 papers)
  2. Jie Lei (52 papers)
  3. Zhikuan Zhu (1 paper)
  4. Siyu Cheng (10 papers)
  5. Zunlei Feng (58 papers)
  6. Ronghua Liang (19 papers)
Citations (95)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub