DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment (2405.11765v1)
Abstract: Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instance-level features in a class-agnostic manner, overlooking the differences between extracted features from different categories, which results in only limited improvement. Furthermore, the scope of current alignment modules is often restricted to a limited batch of images, failing to learn the entire dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. Firstly, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between object detection task and domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.
- Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster r-cnn for object detection in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3339–3348.
- K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Strong-weak distribution alignment for adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6956–6965.
- Y. Xu, Y. Sun, Z. Yang, J. Miao, and Y. Yang, “H 2 fa r-cnn: Holistic and hierarchical feature alignment for cross-domain weakly supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 329–14 339.
- M. Hnewa and H. Radha, “Integrated multiscale domain adaptive yolo,” IEEE Transactions on Image Processing, vol. 32, pp. 1857–1867, 2023.
- Y.-C. Liu, C.-M. Ma, Z. He, C.-W. Kuo, K. Chen, P. Zhang, B. Wu, Z. Kira, and P. Vajda, “Unbiased teacher for semi-supervised object detection,” in Proceedings of the International Conference on Learning Representations, 2021.
- M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu, “End-to-end semi-supervised object detection with soft teacher,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3060–3069.
- J. Han, W. Yang, Y. Wang, L. Chen, and Z. Luo, “Remote sensing teacher: Cross-domain detection transformer with learnable frequency-enhanced feature alignment in remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024.
- W. Wang, Y. Cao, J. Zhang, F. He, Z.-J. Zha, Y. Wen, and D. Tao, “Exploring sequence feature alignment for domain adaptive detection transformers,” in Proceedings of the ACM International Conference on Multimedia, 2021, pp. 1730–1738.
- W.-J. Huang, Y.-L. Lu, S.-Y. Lin, Y. Xie, and Y.-Y. Lin, “Aqt: Adversarial query transformers for domain adaptive object detection,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2022, pp. 972–979.
- J. Zhang, J. Huang, Z. Luo, G. Zhang, and S. Lu, “Da-detr: Domain adaptive detection transformer by hybrid attention,” arXiv preprint arXiv:2103.17084, 2021.
- J. Yu, J. Liu, X. Wei, H. Zhou, Y. Nakata, D. Gudovskiy, T. Okuno, J. Li, K. Keutzer, and S. Zhang, “Cross-domain object detection with mean-teacher transformer,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 629–645.
- Z. Zeng, Y. Ding, and H. Lu, “Enhancing cross-domain detection: adaptive class-aware contrastive transformer,” arXiv preprint arXiv:2401.13264, 2024.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proceedings of the International Conference on Learning Representations, 2021.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
- A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in Proceedings of the International Conference on Pattern Recognition, 2006, pp. 850–855.
- J. Guo, N. Wang, L. Qi, and Y. Shi, “Aloft: A lightweight mlp-like architecture with dynamic low-frequency transform for domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24 132–24 141.
- K. An, Y. Wang, and L. Chen, “Encouraging the mutual interact between dataset-level and image-level context for semantic segmentation of remote sensing image,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024.
- H. Geng, J. Jiang, J. Shen, and M. Hou, “Cascading alignment for unsupervised domain-adaptive detr with improved denoising anchor boxes,” Sensors, vol. 22, no. 24, p. 9629, 2022.
- K. Gong, S. Li, S. Li, R. Zhang, C. Liu, and Q. Chen, “Improving transferability for domain adaptive detection transformers,” in Proceedings of the ACM International Conference on Multimedia, 2022, pp. 1543–1551.
- Z. Tang, Y. Sun, S. Liu, and Y. Yang, “Detr with additional global aggregation for cross-domain weakly supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 422–11 432.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.
- C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” International Journal of Computer Vision, vol. 126, p. 973–992, 2018.
- M. Johnson-Roberson, C. Barto, R. Mehta, S. Sridhar, K. Rosaen, and R. Vasudevan, “Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?” arXiv preprint arXiv:1610.01983, Oct 2016.
- F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645.
- X. Zhu, J. Pang, C. Yang, J. Shi, and D. Lin, “Adapting object detectors via selective cross-domain alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 687–696.
- J. Zhang, J. Huang, Z. Luo, G. Zhang, X. Zhang, and S. Lu, “Da-detr: Domain adaptive detection transformer with information fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 787–23 798.
- R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2006, pp. 1735–1742.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” pp. 1597–1607, 2020.
- Y. Lv, J. Zhang, N. Barnes, and Y. Dai, “Weakly-supervised contrastive learning for unsupervised object discovery,” IEEE Transactions on Image Processing, vol. 33, pp. 2689–2702, 2024.
- R. A. Marsden, A. Bartler, M. Döbler, and B. Yang, “Contrastive learning and self-training for unsupervised domain adaptation in semantic segmentation,” in Proceedings of the 2022 International Joint Conference on Neural Networks, 2022, pp. 1–8.
- Z. Jiang, Y. Li, C. Yang, P. Gao, Y. Wang, Y. Tai, and C. Wang, “Prototypical contrast adaptation for domain adaptive semantic segmentation,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 36–54.
- S. Cao, D. Joshi, L.-Y. Gui, and Y.-X. Wang, “Contrastive mean teacher for domain adaptive object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 839–23 848.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in neural information processing systems, vol. 30, 2017.
- B. Chen, W. Chen, S. Yang, Y. Xuan, J. Song, D. Xie, S. Pu, M. Song, and Y. Zhuang, “Label matching semi-supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 381–14 390.
- Y.-J. Li, X. Dai, C.-Y. Ma, Y.-C. Liu, K. Chen, B. Wu, Z. He, K. Kitani, and P. Vajda, “Cross-domain adaptive teacher for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7581–7590.
- H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” 2023.
- B. Xu, M. Chen, W. Guan, and L. Hu, “Efficient teacher: Semi-supervised object detection for yolov5,” arXiv preprint arXiv:2302.07577, 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- P. Mi, J. Lin, Y. Zhou, Y. Shen, G. Luo, X. Sun, L. Cao, R. Fu, Q. Xu, and R. Ji, “Active teacher for semi-supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 482–14 491.
- S. C. Marsella and J. Gratch, “Ema: A process model of appraisal dynamics,” Cognitive Systems Research, vol. 10, no. 1, p. 70–90, Mar 2009.
- Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proceedings of the International Conference on Machine Learning, 2015, pp. 1180–1189.
- L. He, W. Wang, A. Chen, M. Sun, C.-H. Kuo, and S. Todorovic, “Bidirectional alignment for domain adaptive detection with transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 775–18 785.
- Z. Zhao, S. Wei, Q. Chen, D. Li, Y. Yang, Y. Peng, and Y. Liu, “Masked retraining teacher-student framework for domain adaptive object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 039–19 049.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- M. Xu, H. Wang, B. Ni, Q. Tian, and W. Zhang, “Cross-domain detection via graph-induced prototype alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 355–12 364.
- C.-D. Xu, X.-R. Zhao, X. Jin, and X.-S. Wei, “Exploring categorical regularization for domain adaptive object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 724–11 733.
- C. Hsu, Y.-H. Tsai, Y.-Y. Lin, and M.-H. Yang, “Every pixel matters: Center-aware feature alignment for domain adaptive object detector,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 733–748.
- K. Gong, S. Li, S. Li, R. Zhang, C. H. Liu, and Q. Chen, “Improving transferability for domain adaptive detection transformers,” in Proceedings of the ACM International Conference on Multimedia, 2022, pp. 1543–1551.
- L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- Jianhong Han (6 papers)
- Liang Chen (360 papers)
- Yupei Wang (16 papers)