UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation (2401.12596v2)
Abstract: Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of the diversity. In this paper, we propose UniHDA, a \textbf{unified} and \textbf{versatile} framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linearly interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure \textbf{consistency} with the source domain, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is generator-agnostic and versatile to multiple generators, e.g., StyleGAN, EG3D, and Diffusion Models.
- Large scale gan training for high fidelity natural image synthesis. In ICLR, 2018.
- Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
- StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8188–8197, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Stylegan-nada: Clip-guided domain adaptation of image generators. arXiv preprint arXiv:2108.00946, 2021.
- Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems, 33:9841–9850, 2020.
- Denoising diffusion probabilistic models. Neural Information Processing Systems,Neural Information Processing Systems, 2020.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
- Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022.
- Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022.
- Few-shot hybrid domain adaptation of image generators. arXiv preprint arXiv:2310.19378, 2023.
- Few-shot image generation with elastic weight consolidation. In Advances in Neural Information Processing Systems, pages 15885–15896. Curran Associates, Inc., 2020.
- Freeze the discriminator: a simple baseline for fine-tuning gans. In CVPR AI for Content Creation Workshop, 2020.
- Few-shot cross-domain image generation via inference-time latent-code learning. In The Eleventh International Conference on Learning Representations.
- Domain expansion of image generators. 2023a.
- Domain expansion of image generators. arXiv preprint arXiv:2301.05225, 2023b.
- Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10743–10752, 2021.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
- Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
- Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1532–1540, 2021.
- Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE transactions on pattern analysis and machine intelligence, 44(4):2004–2018, 2020.
- Denoising diffusion implicit models. arXiv: Learning,arXiv: Learning, 2020.
- Gan" steerability" without optimization. arXiv preprint arXiv:2012.05328, 2020.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
- Unsupervised discovery of interpretable directions in the GAN latent space. arXiv preprint arXiv:2002.03754, 2020.
- Domain re-modulation for few-shot generative domain adaptation. arXiv preprint arXiv:2302.02550, 2023.
- StyleSpace analysis: Disentangled controls for StyleGAN image generation. arXiv:2011.12799, 2020.
- Few shot generative model adaption via relaxed spatial structural alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11204–11213, 2022.
- Mind the gap in distilling stylegans. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 423–439. Springer, 2022.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Towards diverse and faithful one-shot adaption of generative adversarial networks. In Advances in Neural Information Processing Systems, 2022.
- Few-shot image generation via adaptation-aware kernel modulation. Advances in Neural Information Processing Systems, 35:19427–19440, 2022a.
- A closer look at few-shot image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9140–9150, 2022b.
- Exploring incompatible knowledge transfer in few-shot image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7380–7391, 2023.
- Hengjia Li (15 papers)
- Yang Liu (2253 papers)
- Yuqi Lin (10 papers)
- Zhanwei Zhang (8 papers)
- Yibo Zhao (19 papers)
- weihang Pan (2 papers)
- Tu Zheng (11 papers)
- Zheng Yang (69 papers)
- Yuchun Jiang (1 paper)
- Boxi Wu (36 papers)
- Deng Cai (181 papers)