Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Domain-Aware Detection Head with Prompt Tuning (2306.05718v3)

Published 9 Jun 2023 in cs.CV

Abstract: Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. However, existing methods focus on reducing the domain bias of the detection backbone by inferring a discriminative visual encoder, while ignoring the domain bias in the detection head. Inspired by the high generalization of vision-LLMs (VLMs), applying a VLM as the robust detection backbone following a domain-aware detection head is a reasonable way to learn the discriminative detector for each domain, rather than reducing the domain bias in traditional methods. To achieve the above issue, we thus propose a novel DAOD framework named Domain-Aware detection head with Prompt tuning (DA-Pro), which applies the learnable domain-adaptive prompt to generate the dynamic detection head for each domain. Formally, the domain-adaptive prompt consists of the domain-invariant tokens, domain-specific tokens, and the domain-related textual description along with the class label. Furthermore, two constraints between the source and target domains are applied to ensure that the domain-adaptive prompt can capture the domains-shared and domain-specific knowledge. A prompt ensemble strategy is also proposed to reduce the effect of prompt disturbance. Comprehensive experiments over multiple cross-domain adaptation tasks demonstrate that using the domain-adaptive prompt can produce an effectively domain-related detection head for boosting domain-adaptive object detection. Our code is available at https://github.com/Therock90421/DA-Pro.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Dual bipartite graph learning: A general approach for domain adaptive object detection. In ICCV, pages 2703–2712, 2021.
  2. Relation matters: foreground-aware graph-based relational reasoning for domain adaptive object detection. TPAMI, 45(03):3677–3694, 2023.
  3. Domain adaptive faster r-cnn for object detection in the wild. In CVPR, pages 3339–3348, 2018.
  4. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  6. Learning to prompt for open-vocabulary object detection with vision-language model. In CVPR, pages 14084–14093, 2022.
  7. The pascal visual object classes challenge: A retrospective. IJCV, 111:98–136, 2015.
  8. P {{\{{\\\backslash\O}}\}} da: Prompt-driven zero-shot domain adaptation. arXiv preprint arXiv:2212.03241, 2022.
  9. Domain-adversarial training of neural networks. JMLR, 17(1):2096–2030, 2016.
  10. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
  11. Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2022.
  12. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  13. Masked autoencoders are scalable vision learners. In CVPR, pages 16000–16009, 2022.
  14. Category contrast for unsupervised domain adaptation in visual tasks. In CVPR, pages 1203–1214, 2022.
  15. Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pages 5001–5009, 2018.
  16. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, pages 4904–4916. PMLR, 2021.
  17. Dual instance-consistent network for cross-domain object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  18. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In ICRA 2017, pages 746–753. IEEE, 2017.
  19. Spatial attention pyramid network for unsupervised domain adaptation. In ECCV, pages 481–497. Springer, 2020.
  20. Grounded language-image pre-training. In CVPR, pages 10965–10975, 2022a.
  21. Scan: Cross domain object detection with semantic conditioned adaptation. In AAAI, volume 6, page 7, 2022b.
  22. Sigma: Semantic-complete graph matching for domain adaptive object detection. In CVPR, pages 5291–5300, 2022c.
  23. Sigma++: Improved semantic-complete graph matching for domain adaptive object detection. TPAMI, 45(07):9022–9040, 2023.
  24. Cross-domain adaptive teacher for object detection. In CVPR, pages 7581–7590, 2022d.
  25. Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017.
  26. Learning transferable features with deep adaptation networks. In ICML, pages 97–105. PMLR, 2015.
  27. Unsupervised domain adaptation with residual transfer networks. NeurIPS, 29, 2016.
  28. Deep transfer learning with joint adaptation networks. In ICML, pages 2208–2217. PMLR, 2017.
  29. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  30. You only look once: Unified, real-time object detection. In CVPR, pages 779–788, 2016.
  31. Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS, 28, 2015.
  32. Semantic foggy scene understanding with synthetic data. IJCV, 126(9):973–992, Sep 2018.
  33. Feature constrained by pixel: Hierarchical adversarial deep domain adaptation. In ACMMM, pages 220–228, 2018.
  34. Attention is all you need. NeurIPS, 30, 2017.
  35. Clip the gap: A single domain generalization approach for object detection. arXiv preprint arXiv:2301.05499, 2023.
  36. Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection. In CVPR, pages 4516–4526, 2021.
  37. Domain-specific suppression for adaptive object detection. In CVPR, pages 9603–9612, 2021.
  38. Vector-decomposed disentanglement for domain-invariant object detection. In ICCV, pages 9342–9351, 2021.
  39. Cross-domain detection via graph-induced prototype alignment. In CVPR, pages 12355–12364, 2020.
  40. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In CVPR, pages 2272–2281, 2017.
  41. Visual-language prompt tuning with knowledge-guided context optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6757–6767, 2023.
  42. Deep transfer network: Unsupervised domain adaptation. arXiv preprint arXiv:1503.00591, 2015.
  43. Rpn prototype alignment for domain adaptive object detector. In CVPR, pages 12425–12434, 2021.
  44. Task-specific inconsistency alignment for domain adaptive object detection. In CVPR, pages 14217–14226, 2022.
  45. Regionclip: Region-based language-image pretraining. In CVPR, pages 16793–16803, 2022.
  46. Conditional prompt learning for vision-language models. In CVPR, pages 16816–16825, 2022a.
  47. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022b.
  48. Self-adversarial disentangling for specific domain adaptation. TPAMI, 45(7):8954–8968, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Haochen Li (42 papers)
  2. Rui Zhang (1138 papers)
  3. Hantao Yao (23 papers)
  4. Xinkai Song (6 papers)
  5. Yifan Hao (28 papers)
  6. Yongwei Zhao (9 papers)
  7. Ling Li (112 papers)
  8. Yunji Chen (51 papers)
Citations (6)