Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection (2405.09942v2)

Published 16 May 2024 in cs.CV

Abstract: Bounding box regression is one of the important steps of object detection. However, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. Most of the existing loss functions for rotated object detection calculate the difference between two bounding boxes only focus on the deviation of area or each points distance (e.g., $\mathcal{L}{Smooth-\ell 1}$, $\mathcal{L}{RotatedIoU}$ and $\mathcal{L}{PIoU}$). The calculation process of some loss functions is extremely complex (e.g. $\mathcal{L}{KFIoU}$). In order to improve the efficiency and accuracy of bounding box regression for rotated object detection, we proposed a novel metric for arbitrary shapes comparison based on minimum points distance, which takes most of the factors from existing loss functions for rotated object detection into account, i.e., the overlap or nonoverlapping area, the central points distance and the rotation angle. We also proposed a loss function called $\mathcal{L}_{FPDIoU}$ based on four points distance for accurate bounding box regression focusing on faster and high quality anchor boxes. In the experiments, $FPDIoU$ loss has been applied to state-of-the-art rotated object detection (e.g., RTMDET, H2RBox) models training with three popular benchmarks of rotated object detection including DOTA, DIOR, HRSC2016 and two benchmarks of arbitrary orientation scene text detection including ICDAR 2017 RRC-MLT and ICDAR 2019 RRC-MLT, which achieves better performance than existing loss functions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. In IEEE Conference on Computer Vision & Pattern Recognition, pages 6517–6525, 2017.
  2. Ali Farhadi Joseph Redmon. Yolov3: An incremental improvement. ArXiv, abs/1804.02767, 2018.
  3. Yolov4: Optimal speed and accuracy of object detection. 2020.
  4. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022.
  5. You only look one-level feature. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13034–13043, 2021.
  6. Fire detection and segmentation using yolov5 and u-net. In 2021 29th European Signal Processing Conference (EUSIPCO), pages 741–745, 2021.
  7. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2016.
  8. Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9657–9666, 2019.
  9. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  10. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, page 516–520, New York, NY, USA, 2016. Association for Computing Machinery.
  11. The cityscapes dataset for semantic urban scene understanding. 06 2016.
  12. The pascal visual object classes challenge. International Journal of Computer Vision, 2015.
  13. Augmented reality meets computer vision : Efficient data generation for urban driving scenes. International Journal of Computer Vision, 126, 09 2018.
  14. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017.
  15. Microsoft coco: Common objects in context. Springer International Publishing, 2014.
  16. Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942, 2015.
  17. Dota: A large-scale dataset for object detection in aerial images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3974–3983, 2018.
  18. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159:296–307, 2020.
  19. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - ICPRAM,, pages 324–331. INSTICC, SciTePress, 2017.
  20. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), volume 1, pages 1454–1459. IEEE, 2017.
  21. Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In 2019 International conference on document analysis and recognition (ICDAR), pages 1582–1587. IEEE, 2019.
  22. Ross Girshick. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV’15, pages 1440–1448, USA, 2015. IEEE Computer Society.
  23. Feature pyramid networks for object detection. IEEE Computer Society, 2017.
  24. Dynamic refinement network for oriented and densely packed object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11204–11213, 2020.
  25. Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2355–2363, 2021.
  26. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3163–3171, 2021.
  27. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022.
  28. Towards multi-class object detection in unconstrained remote sensing imagery. In C. V. Jawahar, Hongdong Li, Greg Mori, and Konrad Schindler, editors, Computer Vision – ACCV 2018, pages 150–165, Cham, 2019. Springer International Publishing.
  29. Learning roi transformer for oriented object detection in aerial images. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2844–2853, 2019.
  30. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8231–8240, 2019.
  31. Mask obb: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote. Sens., 11:2930, 2019.
  32. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4):1452–1459, 2021.
  33. Redet: A rotation-equivariant detector for aerial object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2785–2794, 2021.
  34. Rtmdet: An empirical study of designing real-time object detectors, 2022.
  35. H2rbox: Horizontal box annotation is all you need for oriented object detection. arXiv preprint arXiv:2210.06742, 2022.
  36. You only look once: Unified, real-time object detection. 06 2015.
  37. Unitbox: An advanced object detection network. ACM, 2016.
  38. Improving object localization with fitness nms and bounded iou loss. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6877–6885, 2018.
  39. Generalized intersection over union: A metric and a loss for bounding box regression. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  40. Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12993–13000, 2020.
  41. Focal and efficient iou loss for accurate bounding box regression. Neurocomputing, 506:146–157, 2022.
  42. Piou loss: Towards accurate oriented object detection in complex environments. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 195–211, Cham, 2020. Springer International Publishing.
  43. Rethinking rotated object detection with gaussian wasserstein distance loss. In International Conference on Machine Learning, 2021.
  44. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 18381–18394. Curran Associates, Inc., 2021.
  45. The kfiou loss for rotated object detection, 2022.
  46. P. J. Huber. Robust estimation of a location parameter. Springer New York, 1992.
  47. Seung-Hwan Bae. Object detection based on region decomposition and assembly. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8094–8101, 2019.
  48. Illuminating pedestrians via simultaneous detection & segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 4950–4959, 2017.
  49. Ssa-cnn: Semantic self-attention cnn for pedestrian detection. arXiv preprint arXiv:1902.09080, 2019.
  50. Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4593–4603, 2022.
  51. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–83, 2018.
  52. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499, 2018.
  53. Pointrcnn: 3d object proposal generation and detection from point cloud. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–779. IEEE, 2019.
  54. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
  55. Learnable triangulation of human pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7718–7727, 2019.
  56. Tensormask: A foundation for dense object segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2061–2069, 2019.
  57. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  58. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pages 516–520, 2016.
  59. Automatic differentiation in pytorch. 2017.
  60. Oriented objects as pairs of middle lines. ISPRS Journal of Photogrammetry and Remote Sensing, 169:268–279, 2020.
  61. Objects detection for remote sensing images based on polar coordinates, 2020.
  62. Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2150–2159, 2021.
  63. Dense label encoding for boundary discontinuity free rotation detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15814–15824, 2021.
  64. Polardet: a fast, more precise detector for rotated target in aerial images. International Journal of Remote Sensing, 42(15):5831–5861, 2021.
  65. Cfc-net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
  66. Convex-hull feature adaptation for oriented and densely packed object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(8):5252–5265, 2022.
  67. Optimization for arbitrary-oriented object detection via representation invariance loss. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022.
  68. Learning center probability map for detecting objects in aerial images. IEEE Transactions on Geoscience and Remote Sensing, 59(5):4307–4323, 2021.
  69. Arbitrary-oriented object detection with circular smooth label. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 677–694, Cham, 2020. Springer International Publishing.
  70. Rsdet++: Point-based modulated loss for more accurate rotated object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7869–7879, 2022.
  71. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):2384–2399, 2023.
  72. Oriented r-cnn for object detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3500–3509, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Siliang Ma (2 papers)
  2. Yong Xu (432 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com