Domain-Scalable Unpaired Image Translation via Latent Space Anchoring (2306.14879v1)
Abstract: Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data. However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains. To address this problem, we propose a new domain-scalable UNIT method, termed as latent space anchoring, which can be efficiently extended to new visual domains and does not need to fine-tune encoders and decoders of existing domains. Our method anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models to reconstruct single-domain images. In the inference phase, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning. Experiments on various datasets show that the proposed method achieves superior performance on both standard and domain-scalable UNIT tasks in comparison with the state-of-the-art methods.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017, pp. 2223–2232.
- M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” NeurIPS, vol. 30, 2017.
- T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in ECCV, 2020, pp. 319–345.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
- T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in ICCV, 2019, pp. 2337–2346.
- H.-Y. Lee, H.-Y. Tseng, Q. Mao, J.-B. Huang, Y.-D. Lu, M. Singh, and M.-H. Yang, “Drit++: Diverse image-to-image translation via disentangled representations,” IJCV, vol. 128, no. 10, pp. 2402–2417, 2020.
- Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in CVPR, 2018, pp. 8789–8797.
- Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “Stargan v2: Diverse image synthesis for multiple domains,” in CVPR, 2020, pp. 8188–8197.
- T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in CVPR, 2020, pp. 8110–8119.
- A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in ICLR, 2018.
- A. Almahairi, S. Rajeshwar, A. Sordoni, P. Bachman, and A. Courville, “Augmented cyclegan: Learning many-to-many mappings from unpaired data,” in ICML, 2018, pp. 195–204.
- J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” arXiv preprint arXiv:1907.10830, 2019.
- Y. Zhao, R. Wu, and H. Dong, “Unpaired image-to-image translation using adversarial consistency loss,” in ECCV, 2020, pp. 800–815.
- R. Chen, W. Huang, B. Huang, F. Sun, and B. Fang, “Reusing discriminators for encoding: Towards unsupervised image-to-image translation,” in CVPR, 2020, pp. 8168–8177.
- O. Nizan and A. Tal, “Breaking the cycle-colleagues are all you need,” in CVPR, 2020, pp. 7860–7869.
- X. Shao and W. Zhang, “Spatchgan: A statistical feature based discriminator for unsupervised image-to-image translation,” in ICCV, 2021, pp. 6546–6555.
- Y. Zhao and C. Chen, “Unpaired image-to-image translation via latent energy transport,” in CVPR, 2021, pp. 16 418–16 427.
- F. Pizzati, P. Cerri, and R. de Charette, “Comogan: continuous model-guided image-to-image translation,” in CVPR, 2021, pp. 14 288–14 298.
- T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in ICML, 2017, pp. 1857–1865.
- Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in ICCV, 2017, pp. 2849–2857.
- Y. Liu, M. De Nadai, J. Yao, N. Sebe, B. Lepri, and X. Alameda-Pineda, “Gmm-unit: Unsupervised multi-domain and multi-modal image-to-image translation via attribute gaussian mixture modeling,” arXiv preprint arXiv:2003.06788, 2020.
- W. Xu and G. Wang, “A domain gap aware generative adversarial network for multi-domain image translation,” IEEE TIP, vol. 31, pp. 72–84, 2021.
- V. Vinod, K. R. Prabhakar, R. V. Babu, and A. Chakraborty, “Multi-domain conditional image translation: Translating driving datasets from clear-weather to adverse conditions,” in ICCV, 2021, pp. 1571–1582.
- X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in ECCV, 2018, pp. 172–189.
- H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in ECCV, 2018, pp. 35–51.
- B. Zhao, B. Chang, Z. Jie, and L. Sigal, “Modular generative adversarial networks,” in ECCV, 2018, pp. 150–165.
- A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in ECCV, 2018, pp. 818–833.
- M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in ICCV, 2019, pp. 10 551–10 560.
- A. Romero, P. Arbeláez, L. Van Gool, and R. Timofte, “Smit: Stochastic multi-label image-to-image translation,” in ICCV Workshops, 2019, pp. 0–0.
- Y. Wang, A. Gonzalez-Garcia, J. van de Weijer, and L. Herranz, “Sdit: Scalable and diverse cross-domain image translation,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1267–1276.
- Y.-C. Chen, X. Xu, and J. Jia, “Domain adaptive image-to-image translation,” in CVPR, 2020, pp. 5274–5283.
- K. Baek, Y. Choi, Y. Uh, J. Yoo, and H. Shim, “Rethinking the truly unsupervised image-to-image translation,” in ICCV, 2021, pp. 14 154–14 163.
- Y. Liu, E. Sangineto, Y. Chen, L. Bao, H. Zhang, N. Sebe, B. Lepri, W. Wang, and M. De Nadai, “Smoothing the disentangled latent style space for unsupervised image-to-image translation,” in CVPR, 2021, pp. 10 785–10 794.
- Y. Pang, J. Lin, T. Qin, and Z. Chen, “Image-to-image translation: Methods and applications,” IEEE Transactions on Multimedia, 2021.
- A. Anoosheh, E. Agustsson, R. Timofte, and L. Van Gool, “Combogan: Unrestrained scalability for image domain translation,” in CVPR Workshops, 2018, pp. 783–790.
- D. Bau, J.-Y. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum, W. T. Freeman, and A. Torralba, “Gan dissection: Visualizing and understanding generative adversarial networks,” in ICLR, 2019.
- W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, and M.-H. Yang, “Gan inversion: A survey,” arXiv preprint arXiv:2101.05278, 2021.
- R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan: How to embed images into the stylegan latent space?” in ICCV, 2019, pp. 4432–4441.
- E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in CVPR, 2021, pp. 2287–2296.
- Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the latent space of gans for semantic face editing,” in CVPR, 2020, pp. 9243–9252.
- A. Cherepkov, A. Voynov, and A. Babenko, “Navigating the gan parameter space for semantic image editing,” in CVPR, 2021, pp. 3671–3680.
- O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. Cohen-Or, “Designing an encoder for stylegan image manipulation,” ACM TOG, vol. 40, no. 4, pp. 1–14, 2021.
- X. Pan, X. Zhan, B. Dai, D. Lin, C. C. Loy, and P. Luo, “Exploiting deep generative prior for versatile image restoration and manipulation,” IEEE TPAMI, 2021.
- R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan++: How to edit the embedded images?” in CVPR, 2020, pp. 8296–8305.
- Y. Shen and B. Zhou, “Closed-form factorization of latent semantics in gans,” in CVPR, 2021, pp. 1532–1540.
- A. Voynov and A. Babenko, “Unsupervised discovery of interpretable directions in the gan latent space,” in ICML, 2020, pp. 9786–9796.
- Y. Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Datasetgan: Efficient labeled data factory with minimal human effort,” in CVPR, 2021, pp. 10 145–10 155.
- D. Pakhomov, S. Hira, N. Wagle, K. E. Green, and N. Navab, “Segmentation in style: Unsupervised semantic image segmentation with stylegan and clip,” arXiv preprint arXiv:2107.12518, 2021.
- S. Benaim and L. Wolf, “One-shot unsupervised cross domain translation,” advances in neural information processing systems, vol. 31, 2018.
- J. Back, “Fine-tuning stylegan2 for cartoon face generation,” arXiv preprint arXiv:2106.12445, 2021.
- S. Mo, M. Cho, and J. Shin, “Freeze the discriminator: a simple baseline for fine-tuning gans,” arXiv preprint arXiv:2002.10964, 2020.
- K. et al., “Alias-free generative adversarial networks,” in NeurIPS, 2021.
- T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in CVPR, 2019, pp. 4401–4410.
- C.-H. Lee, Z. Liu, L. Wu, and P. Luo, “Maskgan: Towards diverse and interactive facial image manipulation,” in CVPR, 2020, pp. 5549–5558.
- X. Wang and X. Tang, “Face photo-sketch synthesis and recognition,” IEEE TPAMI, vol. 31, no. 11, pp. 1955–1967, 2008.
- W. Zhang, X. Wang, and X. Tang, “Coupled information-theoretic encoding for face photo-sketch recognition,” in CVPR, 2011, pp. 513–520.
- V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in CVPR, 2014, pp. 1867–1874.
- D. E. King, “Dlib-ml: A machine learning toolkit,” JMLR, vol. 10, pp. 1755–1758, 2009.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
- D. Kuettel, M. Guillaumin, and V. Ferrari, “Segmentation propagation in imagenet,” in ECCV, 2012, pp. 459–473.
- M. Guillaumin, D. Küttel, and V. Ferrari, “Imagenet auto-annotation with segmentation propagation,” IJCV, vol. 110, no. 3, pp. 328–348, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” NeurIPS, vol. 30, 2017.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018, pp. 586–595.
- J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in ECCV, 2020.
- M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” NeurIPS, vol. 28, 2015.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in ICCV, 2017, pp. 764–773.
- H. Zheng, Z. Lin, J. Lu, S. Cohen, J. Zhang, N. Xu, and J. Luo, “Semantic layout manipulation with high-resolution sparse attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- H. Zheng, Z. Lin, J. Lu, S. Cohen, E. Shechtman, C. Barnes, J. Zhang, N. Xu, S. Amirghodsi, and J. Luo, “Cm-gan: Image inpainting with cascaded modulation gan and object-aware training,” 2022.
- Y. Tan, H. Zheng, Y. Zhu, X. Yuan, X. Lin, D. Brady, and L. Fang, “Crossnet++: Cross-scale large-parallax warping for reference-based super-resolution,” IEEE TPAMI, vol. 43, no. 12, pp. 4291–4305, 2020.
- P. Zhang, B. Zhang, D. Chen, L. Yuan, and F. Wen, “Cross-domain correspondence learning for exemplar-based image translation,” in CVPR, 2020, pp. 5143–5153.