LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network (2404.01614v1)
Abstract: Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.
- R. Guan, Z. Li, W. Tu, J. Wang, Y. Liu, X. Li, C. Tang, and R. Feng, “Contrastive multi-view subspace clustering of hyperspectral images based on graph convolutional networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024.
- X. Yang, J. Yan, Z. Feng, and T. He, “R3det: Refined single-stage detector with feature refinement for rotating object,” vol. 35, no. 4, pp. 3163–3171, 2021.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
- J. Hu, Z. Huang, F. Shen, D. He, and Q. Xian, “A bag of tricks for fine-grained roof extraction,” in IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023.
- C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “Augfpn: Improving multi-scale feature learning for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 595–12 604.
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768.
- W. Weng, W. Ling, F. Lin, J. Ren, and F. Shen, “A novel cross frequency-domain interaction learning for aerial oriented object detection,” in Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 2023.
- W. Weng, M. Wei, J. Ren, and F. Shen, “Enhancing aerial object detection with selective frequency interaction network,” IEEE Transactions on Artificial Intelligence, vol. 1, no. 01, pp. 1–12, 2024.
- C. Qiao, F. Shen, X. Wang, R. Wang, F. Cao, S. Zhao, and C. Li, “A novel multi-frequency coordinated module for sar ship detection,” in 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2022, pp. 804–811.
- S. Shivappriya, M. Priyadarsini, A. Stateczny, C. Puttamadappa, and B. Parameshachari, “Cascade object detection and remote sensing object detection method based on trainable activation function,” Remote Sensing, vol. 13, no. 2, p. 200, 2021.
- W. Huang, G. Li, Q. Chen, M. Ju, and J. Qu, “Cf2pn: A cross-scale feature fusion pyramid network based remote sensing target detection,” Remote Sensing, vol. 13, no. 5, p. 847, 2021.
- K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and context-augmented object detection in remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2337–2348, 2017.
- J. Liu, F. Shen, M. Wei, Y. Zhang, H. Zeng, J. Zhu, and C. Cai, “A large-scale benchmark for vehicle logo recognition,” in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC). IEEE, 2019, pp. 479–483.
- S. Lai, X. Hu, Y. Li, Z. Ren, Z. Liu, and D. Miao, “Shared and private information learning in multimodal sentiment analysis with deep modal alignment and self-supervised multi-task learning,” arXiv preprint arXiv:2305.08473, 2023.
- M. Li, M. Wei, X. He, and F. Shen, “Enhancing part features via contrastive attention module for vehicle re-identification,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 1816–1820.
- S. Lai, L. Hu, J. Wang, L. Berti-Equille, and D. Wang, “Faithful vision-language interpretation via concept bottleneck models,” in The Twelfth International Conference on Learning Representations, 2023.
- L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
- F. Shen, J. Zhu, X. Zhu, Y. Xie, and J. Huang, “Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8793–8804, 2021.
- F. Shen, X. Shu, X. Du, and J. Tang, “Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval,” in Proceedings of the 31th ACM International Conference on Multimedia, 2023.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890.
- S. Lai, X. Hu, H. Xu, Z. Ren, and Z. Liu, “Multimodal sentiment analysis: A survey,” Displays, p. 102563, 2023.
- F. Shen, Y. Xie, J. Zhu, X. Zhu, and H. Zeng, “Git: Graph interactive transformer for vehicle re-identification,” IEEE Transactions on Image Processing, 2023.
- H. Xu, S. Lai, X. Li, and Y. Yang, “Cross-domain car detection model with integrated convolutional block attention mechanism,” Image and Vision Computing, vol. 140, p. 104834, 2023.
- R. Guan, Z. Li, T. Li, X. Li, J. Yang, and W. Chen, “Classification of heterogeneous mining areas based on rescapsnet and gaofen-5 imagery,” Remote Sensing, vol. 14, no. 13, p. 3216, 2022.
- R. Guan, Z. Li, X. Li, and C. Tang, “Pixel-superpixel contrastive learning and pseudo-label correction for hyperspectral image clustering,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 6795–6799.
- J. Liu, R. Guan, Z. Li, J. Zhang, Y. Hu, and X. Wang, “Adaptive multi-feature fusion graph convolutional network for hyperspectral image classification,” Remote Sensing, vol. 15, no. 23, p. 5483, 2023.
- W. Tu, R. Guan, S. Zhou, C. Ma, X. Peng, Z. Cai, Z. Liu, J. Cheng, and X. Liu, “Attribute-missing graph clustering network,” in Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024.
- G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Nas-fpn: Learning scalable feature pyramid architecture for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7036–7045.
- F. Shen, J. Zhu, X. Zhu, J. Huang, H. Zeng, Z. Lei, and C. Cai, “An efficient multiresolution network for vehicle reidentification,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 9049–9059, 2021.
- M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 781–10 790.
- G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection in aerial images,” pp. 3974–3983, 2018.
- Y. Lin, P. Feng, J. Guan, W. Wang, and J. Chambers, “Ienet: Interacting embranchment one stage anchor free detector for orientation aerial object detection,” arXiv preprint arXiv:1912.00969, 2019.
- X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, and Z. Guo, “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote sensing, vol. 10, no. 1, p. 132, 2018.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, “Light-head r-cnn: In defense of two-stage object detector,” arXiv preprint arXiv:1711.07264, 2017.
- Y. Li, Q. Huang, X. Pei, L. Jiao, and R. Shang, “Radet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images,” Remote Sensing, vol. 12, no. 3, p. 389, 2020.
- X. Yang, G. Zhang, W. Li, X. Wang, Y. Zhou, and J. Yan, “H2rbox: Horizonal box annotation is all you need for oriented object detection,” arXiv preprint arXiv:2210.06742, 2022.
- W. Li, W. Wei, and L. Zhang, “Gsdet: Object detection in aerial images based on scale reasoning,” IEEE Transactions on Image Processing, vol. 30, pp. 4599–4609, 2021.
- H. Chi, X. Zhang, and X. Gao, “Multi-target detection for aerial images based on fully convolutional networks,” in 2019 Chinese Control Conference (CCC). IEEE, 2019, pp. 8801–8806.
- Z. Liu, L. Yuan, L. Weng, and Y. Yang, “A high resolution optical satellite image dataset for ship recognition and some new baselines,” in International conference on pattern recognition applications and methods, vol. 2. SciTePress, 2017, pp. 324–331.
- Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, and Z. Luo, “R2cnn: Rotational region cnn for orientation robust scene text detection,” arXiv preprint arXiv:1706.09579, 2017.
- J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE transactions on multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
- Z. Zhang, W. Guo, S. Zhu, and W. Yu, “Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 11, pp. 1745–1749, 2018.
- M. Liao, Z. Zhu, B. Shi, G.-s. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5909–5918.
- J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2849–2858.
- Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-iou loss: Faster and better learning for bounding box regression,” vol. 34, no. 07, pp. 12 993–13 000, 2020.
- L. Hou, K. Lu, and J. Xue, “Refined one-stage oriented object detection method for remote sensing images,” IEEE Transactions on Image Processing, vol. 31, pp. 1545–1558, 2022.
- F. Shen, X. Du, L. Zhang, and J. Tang, “Triplet contrastive learning for unsupervised vehicle re-identification,” arXiv preprint arXiv:2301.09498, 2023.
- C. Yu, B. Xiao, C. Gao, L. Yuan, L. Zhang, N. Sang, and J. Wang, “Lite-hrnet: A lightweight high-resolution network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 440–10 450.
- Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 713–13 722.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.