Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection (2404.03507v6)

Published 4 Apr 2024 in cs.CV

Abstract: Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects. Our code will be available at https://github.com/hoiliu-0801/DQ-DETR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  2. End-to-end object detection with transformers. In ECCV, pages 213–229, 2020.
  3. Dynamic detr: End-to-end object detection with dynamic attention. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2968–2977, 2021.
  4. Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13435–13444. IEEE Computer Society, 2023.
  5. YOLOv5: SOTA Realtime Instance Segmentation, 2022.
  6. Augmentation for small object detection. arXiv preprint arXiv:1902.07296, 2019.
  7. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022.
  8. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 936–944, 2017.
  9. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2):318–327, 2020.
  10. DAB-DETR: Dynamic anchor boxes are better queries for DETR. In International Conference on Learning Representations, 2022.
  11. Scale decoupled pyramid for object detection in aerial images. IEEE Transactions on Geoscience and Remote Sensing, 61:1–14, 2023.
  12. Conditional detr for fast training convergence. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021.
  13. Localization recall precision (lrp): A new performance metric for object detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 504–519, 2018.
  14. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10213–10224, 2021.
  15. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  16. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  17. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 658–666, 2019.
  18. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3611–3620, 2021.
  19. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  20. A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389, 2021a.
  21. Tiny object detection in aerial images. In ICPR, pages 3791–3798, 2021b.
  22. Anchor detr: Query design for transformer-based detector. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2567–2575, 2022.
  23. Dot distance for tiny object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1192–1201, 2021.
  24. Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 190:79–93, 2022a.
  25. Rfla: Gaussian receptive field based label assignment for tiny object detection. In European conference on computer vision, pages 526–543. Springer, 2022b.
  26. Dino: Detr with improved denoising anchor boxes for end-to-end object detection, 2022.
  27. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7380–7399, 2021a.
  28. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021b.
  29. Learning data augmentation strategies for object detection. In European conference on computer vision, pages 566–583. Springer, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yi-Xin Huang (1 paper)
  2. Hou-I Liu (7 papers)
  3. Hong-Han Shuai (56 papers)
  4. Wen-Huang Cheng (40 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com