Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DEYOv3: DETR with YOLO for Real-time Object Detection (2309.11851v2)

Published 21 Sep 2023 in cs.CV

Abstract: Recently, end-to-end object detectors have gained significant attention from the research community due to their outstanding performance. However, DETR typically relies on supervised pretraining of the backbone on ImageNet, which limits the practical application of DETR and the design of the backbone, affecting the model's potential generalization ability. In this paper, we propose a new training method called step-by-step training. Specifically, in the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector. In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch. Due to this training method, the object detector does not need the additional dataset (ImageNet) to train the backbone, which makes the design of the backbone more flexible and dramatically reduces the training cost of the detector, which is helpful for the practical application of the object detector. At the same time, compared with the DETR-like model, the step-by-step training method can achieve higher accuracy than the traditional training method of the DETR-like model. With the aid of this novel training method, we propose a brand-new end-to-end real-time object detection model called DEYOv3. DEYOv3-N achieves 41.1% on COCO val2017 and 270 FPS on T4 GPU, while DEYOv3-L achieves 51.3% AP and 102 FPS. Without the use of additional training data, DEYOv3 surpasses all existing real-time object detectors in terms of both speed and accuracy. It is worth noting that for models of N, S, and M scales, the training on the COCO dataset can be completed using a single 24GB RTX3090 GPU. Code will be released at https://github.com/ouyanghaodong/DEYOv3.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Yolov4: Optimal speed and accuracy of object detection. ArXiv, abs/2004.10934, 2020.
  2. End-to-end object detection with transformers. In Computer Vision – ECCV 2020, pages 213–229, Cham, 2020. Springer International Publishing.
  3. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  4. Fast convergence of detr with spatially modulated co-attention. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3601–3610, 2021.
  5. Yolox: Exceeding yolo series in 2021. ArXiv, abs/2107.08430, 2021.
  6. Jocher Glenn. Yolov8. https://github.com/ultralytics/ultralytics/tree/main, 2023.
  7. Jocher Glenn. Yolov5 release v7.0. 2022. https://github.com/ultralytics/yolov5/tree/v7.0, 2022.
  8. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  9. Gradient-based learning applied to document recognition. Proc. IEEE, 86:2278–2324, 1998.
  10. Yolov6: A single-stage object detection framework for industrial applications. ArXiv, abs/2209.02976, 2022a.
  11. Dn-detr: Accelerate detr training by introducing query denoising. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13609–13617, 2022b.
  12. Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
  13. Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944, 2016.
  14. Dab-detr: Dynamic anchor boxes are better queries for detr. In International Conference on Learning Representations.
  15. Path aggregation network for instance segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759–8768, 2018.
  16. Detrs beat yolos on real-time object detection. ArXiv, abs/2304.08069, 2023.
  17. Conditional detr for fast training convergence. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3631–3640, 2021.
  18. Hao Ouyang. Deyo: Detr with yolo for step-by-step object detection. ArXiv, abs/2211.06588, 2022.
  19. Hao Ouyang. Deyov2: Rank feature with greedy matching for end-to-end object detection. ArXiv, abs/2306.09165, 2023.
  20. Yolo9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2016.
  21. Yolov3: An incremental improvement. ArXiv, abs/1804.02767, 2018.
  22. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2015.
  23. Sparse detr: Efficient end-to-end object detection with learnable sparsity. ArXiv, abs/2111.14330, 2021.
  24. Crowdhuman: A benchmark for detecting human in a crowd. ArXiv, abs/1805.00123, 2018.
  25. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  26. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. ArXiv, abs/2207.02696, 2022a.
  27. Anchor detr: Query design for transformer-based detector. ArXiv, abs/2109.07107, 2022b.
  28. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. ArXiv, abs/2203.03605, 2022.
  29. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Haodong Ouyang (4 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.