Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer (2305.07598v5)

Published 12 May 2023 in cs.CV and cs.LG

Abstract: Detection Transformers (DETR) have recently set new benchmarks in object detection. However, their performance in detecting rotated objects lags behind established oriented object detectors. Our analysis identifies a key observation: the boundary discontinuity and square-like problem in bipartite matching poses an issue with assigning appropriate ground truths to predictions, leading to duplicate low-confidence predictions. To address this, we introduce a Hausdorff distance-based cost for bipartite matching, which more accurately quantifies the discrepancy between predictions and ground truths. Additionally, we find that a static denoising approach impedes the training of rotated DETR, especially as the quality of the detector's predictions begins to exceed that of the noised ground truths. To overcome this, we propose an adaptive query denoising method that employs bipartite matching to selectively eliminate noised queries that detract from model improvement. When compared to models adopting a ResNet-50 backbone, our proposed model yields remarkable improvements, achieving $\textbf{+4.18}$ AP${50}$, $\textbf{+4.59}$ AP${50}$, and $\textbf{+4.99}$ AP$_{50}$ on DOTA-v2.0, DOTA-v1.5, and DIOR-R, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. The topology of the ρ𝜌\rhoitalic_ρ-hausdorff distance. Annali di Matematica pura ed applicata, 160(1):303–320, 1991.
  2. Align-detr: Improving detr with simple iou-aware bce loss. arXiv preprint arXiv:2304.07527, 2023.
  3. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 213–229. Springer, 2020.
  4. A billion-scale foundation model for remote sensing images. arXiv preprint arXiv:2304.05215, 2023.
  5. Enhanced training of query-based object detection via selective query recollection. In Proceedings of the IEEE/CVF international conference on computer vision, 2022a.
  6. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4974–4983, 2019.
  7. Efficient decoder-free object detection with transformers. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pages 70–86. Springer, 2022b.
  8. Group detr: Fast training convergence with decoupled one-to-many label assignment. arXiv preprint arXiv:2207.13085, 2022c.
  9. Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022.
  10. Ao2-detr: Arbitrary-oriented object detection transformer. IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  11. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2988–2997, 2021.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  13. Learning roi transformer for oriented object detection in aerial images. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2844–2853, 2019.
  14. Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11):7778–7796, 2021a.
  15. Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11):7778–7796, 2021b.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  17. Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8792–8801, 2021.
  18. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2021a.
  19. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2786–2795, 2021b.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  21. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  22. Destr: Object detection with split transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9377–9386, 2022.
  23. Emo2-detr: Efficient-matching oriented object detection with transformers. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  24. Detrs with hybrid matching. In Proceedings of the IEEE/CVF international conference on computer vision, 2023.
  25. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  26. Rbox-cnn: Rotated bounding box based cnn for ship detection in remote sensing image. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 420–423, New York, NY, USA, 2018. Association for Computing Machinery.
  27. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022.
  28. Lite detr: An interleaved multi-scale encoder for efficient detr. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2023.
  29. One-to-few label assignment for end-to-end dense detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2013.
  30. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  31. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  32. Wb-detr: transformer-based detector without backbone. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2979–2987, 2021a.
  33. DAB-DETR: Dynamic anchor boxes are better queries for DETR. In International Conference on Learning Representations, 2022.
  34. Detection transformer with stable matching. arXiv preprint arXiv:2304.04742, 2023a.
  35. Sap-detr: Bridging the gap between salient points and queries-based transformer detector for fast model convergency. In Proceedings of the IEEE/CVF international conference on computer vision, 2023b.
  36. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
  37. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  38. Rtmdet: An empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784, 2022.
  39. Oriented object detection with transformer. arXiv preprint arXiv:2106.03146, 2021.
  40. Dynamic anchor learning for arbitrary-oriented object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3):2355–2363, 2021.
  41. Learning modulated loss for rotated object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3):2458–2466, 2021.
  42. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  43. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019.
  44. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3611–3620, 2021.
  45. Advancing plain vision transformer towards remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing, 2022a.
  46. Internimage: Exploring large-scale vision foundation models with deformable convolutions. arXiv preprint arXiv:2211.05778, 2022b.
  47. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018.
  48. Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3520–3529, 2021.
  49. Dynamic coarse-to-fine learning for oriented tiny object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  50. Focal modulation networks. Advances in Neural Information Processing Systems, 35:4203–4217, 2022.
  51. Arbitrary-oriented object detection with circular smooth label. In Computer Vision – ECCV 2020, pages 677–694, Cham, 2020. Springer International Publishing.
  52. Dense label encoding for boundary discontinuity free rotation detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15814–15824, 2021a.
  53. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15819–15829, 2021b.
  54. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI conference on artificial intelligence, pages 3163–3171, 2021c.
  55. Rethinking rotated object detection with gaussian wasserstein distance loss. In Proceedings of the 38th International Conference on Machine Learning, pages 11830–11841. PMLR, 2021d.
  56. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. In Advances in Neural Information Processing Systems, pages 18381–18394. Curran Associates, Inc., 2021e.
  57. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(02):2384–2399, 2023a.
  58. The KFIou loss for rotated object detection. In The Eleventh International Conference on Learning Representations, 2023b.
  59. The KFIou loss for rotated object detection. In The Eleventh International Conference on Learning Representations, 2023c.
  60. Phase-shifting coder: Predicting accurate orientation in oriented object detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  61. Ars-detr: Aspect ratio sensitive oriented object detection with transformer. arXiv preprint arXiv:2303.04989, 2023.
  62. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In The Eleventh International Conference on Learning Representations, 2023a.
  63. Dense distinct query for end-to-end object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2023b.
  64. Iou loss for 2d/3d object detection. In 2019 international conference on 3D vision (3DV), pages 85–94. IEEE, 2019.
  65. D2q-detr: Decoupling and dynamic queries for oriented object detection with transformers. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  66. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7331–7334, 2022.
  67. Deformable {detr}: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.
  68. Detrs with collaborative hybrid assignments training. arXiv preprint arXiv:2211.12860, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hakjin Lee (4 papers)
  2. Minki Song (3 papers)
  3. Jamyoung Koo (4 papers)
  4. Junghoon Seo (22 papers)
Citations (3)