DEYOv2: Rank Feature with Greedy Matching for End-to-End Object Detection (2306.09165v2)
Abstract: This paper presents a novel object detector called DEYOv2, an improved version of the first-generation DEYO (DETR with YOLO) model. DEYOv2, similar to its predecessor, DEYOv2 employs a progressive reasoning approach to accelerate model training and enhance performance. The study delves into the limitations of one-to-one matching in optimization and proposes solutions to effectively address the issue, such as Rank Feature and Greedy Matching. This approach enables the third stage of DEYOv2 to maximize information acquisition from the first and second stages without needing NMS, achieving end-to-end optimization. By combining dense queries, sparse queries, one-to-many matching, and one-to-one matching, DEYOv2 leverages the advantages of each method. It outperforms all existing query-based end-to-end detectors under the same settings. When using ResNet-50 as the backbone and multi-scale features on the COCO dataset, DEYOv2 achieves 51.1 AP and 51.8 AP in 12 and 24 epochs, respectively. Compared to the end-to-end model DINO, DEYOv2 provides significant performance gains of 2.1 AP and 1.4 AP in the two epoch settings. To the best of our knowledge, DEYOv2 is the first fully end-to-end object detector that combines the respective strengths of classical detectors and query-based detectors.
- Soft-nms — improving object detection with one line of code. 2017 IEEE International Conference on Computer Vision (ICCV), pages 5562–5570, 2017.
- End-to-end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 213–229, Cham, 2020. Springer International Publishing.
- Group detr: Fast training convergence with decoupled one-to-many label assignment. ArXiv, abs/2207.13085, 2022.
- Dynamic detr: End-to-end object detection with dynamic attention. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2968–2977, 2021.
- Fast convergence of detr with spatially modulated co-attention. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3601–3610, 2021.
- Adamixer: A fast-converging query-based object detector. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5354–5363, 2022.
- Ross B. Girshick. Fast r-cnn. 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2013.
- Jocher Glenn. Yolov5 release v6.2. https://github.com/ ultralytics/yolov5/releases/tag/v6.2, 2022.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Softer-nms: Rethinking bounding box regression for accurate object detection. ArXiv, abs/1809.08545, 2018.
- Relation networks for object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3588–3597, 2017.
- Detrs with hybrid matching. ArXiv, abs/2207.13080, 2022.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- Dn-detr: Accelerate detr training by introducing query denoising. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13609–13617, 2022.
- Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944, 2016.
- Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:318–327, 2017.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
- Adaptive nms: Refining pedestrian detection in a crowd. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6452–6461, 2019.
- Dab-detr: Dynamic anchor boxes are better queries for detr. In International Conference on Learning Representations.
- Path aggregation network for instance segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759–8768, 2018.
- Ssd: Single shot multibox detector. In European Conference on Computer Vision, 2015.
- Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021.
- Detrs beat yolos on real-time object detection. ArXiv, abs/2304.08069, 2023.
- Conditional detr for fast training convergence. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3631–3640, 2021.
- Hao Ouyang. Deyo: Detr with yolo for step-by-step object detection. ArXiv, abs/2211.06588, 2022.
- Nms strikes back. ArXiv, abs/2212.06137, 2022.
- You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2015.
- Yolo9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2016.
- Yolov3: An incremental improvement. ArXiv, abs/1804.02767, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137–1149, 2015.
- Sparse detr: Efficient end-to-end object detection with learnable sparsity. ArXiv, abs/2111.14330, 2021.
- Objects365: A large-scale, high-quality dataset for object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8429–8438, 2019.
- Rethinking transformer-based set prediction for object detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3591–3600, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Anchor detr: Query design for transformer-based detector. ArXiv, abs/2109.07107, 2022.
- Efficient detr: Improving end-to-end object detector with dense prior. ArXiv, abs/2104.01318, 2021.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. ArXiv, abs/2203.03605, 2022.
- Dense distinct query for end-to-end object detection. ArXiv, abs/2303.12776, 2023.
- Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.
- Detrs with collaborative hybrid assignments training. ArXiv, abs/2211.12860, 2022.
- Haodong Ouyang (4 papers)