Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection (2404.01819v1)
Abstract: In this paper, we address the limitations of the DETR-based semi-supervised object detection (SSOD) framework, particularly focusing on the challenges posed by the quality of object queries. In DETR-based SSOD, the one-to-one assignment strategy provides inaccurate pseudo-labels, while the one-to-many assignments strategy leads to overlapping predictions. These issues compromise training efficiency and degrade model performance, especially in detecting small or occluded objects. We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution to overcome these challenges. Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects. Additionally, we integrate a Reliable Pseudo-Label Filtering Module that selectively filters high-quality pseudo-labels, thereby enhancing detection accuracy and consistency. On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods that highlight Sparse Semi-DETR's effectiveness in semi-supervised object detection, particularly in challenging scenarios involving small or partially obscured objects.
- Active cost-aware labeling of streaming data. In International Conference on Artificial Intelligence and Statistics, pages 9117–9136. PMLR, 2023.
- End-to-end object detection with transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pages 213–229. Springer, 2020.
- Dense learning based semi-supervised object detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4805–4814, 2022.
- MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Recurrent glimpse-based decoder for detection with transformer. CoRR, abs/2112.04632, 2021.
- Up-detr: Unsupervised pre-training for object detection with transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1601–1610, 2020.
- The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111:98–136, 2015.
- You only look at one sequence: Rethinking transformer in vision through object detection. CoRR, abs/2106.00666, 2021.
- Fast convergence of DETR with spatially modulated co-attention. CoRR, abs/2101.07448, 2021.
- Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
- Pars: Pseudo-label aware robust sample selection for learning with noisy labels. arXiv preprint arXiv:2201.10836, 2022.
- Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1780–1789, 2020.
- Scale-equivalent distillation for semi-supervised object detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14502–14511, 2022.
- Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
- Pseudoprop: Robust pseudo-label generation for semi-supervised object detection in autonomous driving systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4390–4398, 2022.
- Consistency-based semi-supervised learning for object detection. In Neural Information Processing Systems, 2019.
- Detrs with hybrid matching. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19702–19712, 2022.
- Revisiting class imbalance for end-to-end semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4569–4578, 2023.
- Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022a.
- Pseco: Pseudo labeling and consistency training for semi-supervised object detection. In Computer Vision – ECCV 2022, pages 457–472, Cham, 2022b. Springer Nature Switzerland.
- Important object identification with semi-supervised learning for autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA), pages 2913–2919. IEEE, 2022c.
- Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014.
- Feature pyramid networks for object detection. CoRR, abs/1612.03144, 2016.
- Focal loss for dense object detection. CoRR, abs/1708.02002, 2017.
- Wb-detr: Transformer-based detector without backbone. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2959–2967, 2021a.
- DAB-DETR: dynamic anchor boxes are better queries for DETR. CoRR, abs/2201.12329, 2022a.
- Ssd: Single shot multibox detector. In Computer Vision – ECCV 2016, pages 21–37, Cham, 2016. Springer International Publishing.
- Unbiased teacher for semi-supervised object detection. In Proceedings of the International Conference on Learning Representations (ICLR), 2021b.
- Unbiased teacher v2: Semi-supervised object detection for anchor-free and anchor-based detectors, 2022b.
- Conditional DETR for fast training convergence. CoRR, abs/2108.06152, 2021.
- Adapting object size variance and class imbalance for semi-supervised object detection. In AAAI Conference on Artificial Intelligence, 2023.
- Automated detection and segmentation of hbms in 3d x-ray images using semi-supervised deep learning. In 2022 IEEE 72nd Electronic Components and Technology Conference (ECTC), pages 1890–1897, 2022.
- Evaluating the prediction bias induced by label imbalance in multi-label classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, page 3368–3372, New York, NY, USA, 2021. Association for Computing Machinery.
- Yolov3: An incremental improvement. CoRR, abs/1804.02767, 2018.
- You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, Los Alamitos, CA, USA, 2016. IEEE Computer Society.
- Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell., 39(6):1137–1149, 2017.
- Generalized intersection over union: A metric and A loss for bounding box regression. CoRR, abs/1902.09630, 2019.
- Sparse DETR: efficient end-to-end object detection with learnable sparsity. CoRR, abs/2111.14330, 2021.
- Claudio Filipi Gonçalves Dos Santos and João Paulo Papa. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Comput. Surv., 54(10s), 2022.
- Object detection with transformers: A review, 2023.
- A simple semi-supervised learning framework for object detection. CoRR, abs/2005.04757, 2020.
- Sparse R-CNN: end-to-end object detection with learnable proposals. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 14454–14463. Computer Vision Foundation / IEEE, 2021.
- Humble teachers teach better students for semi-supervised object detection. CoRR, abs/2106.10456, 2021.
- FCOS: fully convolutional one-stage object detection. CoRR, abs/1904.01355, 2019.
- Attention is all you need. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
- Focalmix: Semi-supervised learning for 3d medical image detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3951–3960, 2020.
- Double-check soft teacher for semi-supervised object detection. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 1430–1436. International Joint Conferences on Artificial Intelligence Organization, 2022a. Main Track.
- Omni-detr: Omni-supervised object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9367–9376, 2022b.
- Pnp-detr: Towards efficient visual analysis with transformers. CoRR, abs/2109.07036, 2021.
- FP-DETR: Detection transformer advanced by fully pre-training. In International Conference on Learning Representations, 2022c.
- Consistent-teacher: Towards reducing inconsistent pseudo-targets in semi-supervised object detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3240–3249, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Self-training with noisy student improves imagenet classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2020.
- End-to-end semi-supervised object detection with soft teacher. CoRR, abs/2106.09018, 2021.
- Interactive self-training with mean teachers for semi-supervised object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5937–5946, 2021.
- Towards efficient and scale-robust ultra-high-definition image demoiréing. In European Conference on Computer Vision, pages 646–662. Springer, 2022.
- mixup: Beyond empirical risk minimization. ArXiv, abs/1710.09412, 2017.
- DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In The Eleventh International Conference on Learning Representations, 2023a.
- Semi-detr: Semi-supervised object detection with detection transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23809–23818, 2023b.
- Dense teacher: Dense pseudo-labels for semi-supervised object detection, 2022.
- Instant-teaching: An end-to-end semi-supervised object detection framework. CoRR, abs/2103.11402, 2021.
- Deformable {detr}: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.