Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation (2309.13188v1)

Published 22 Sep 2023 in cs.CV, cs.AI, cs.GR, and cs.LG

Abstract: A common goal of unpaired image-to-image translation is to preserve content consistency between source images and translated images while mimicking the style of the target domain. Due to biases between the datasets of both domains, many methods suffer from inconsistencies caused by the translation process. Most approaches introduced to mitigate these inconsistencies do not constrain the discriminator, leading to an even more ill-posed training setup. Moreover, none of these approaches is designed for larger crop sizes. In this work, we show that masking the inputs of a global discriminator for both domains with a content-based mask is sufficient to reduce content inconsistencies significantly. However, this strategy leads to artifacts that can be traced back to the masking process. To reduce these artifacts, we introduce a local discriminator that operates on pairs of small crops selected with a similarity sampling strategy. Furthermore, we apply this sampling strategy to sample global input crops from the source and target dataset. In addition, we propose feature-attentive denormalization to selectively incorporate content-based statistics into the generator stream. In our experiments, we show that our method achieves state-of-the-art performance in photorealistic sim-to-real translation and weather translation and also performs well in day-to-night translation. Additionally, we propose the cKVD metric, which builds on the sKVD metric and enables the examination of translation quality at the class or category level.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. F. Pizzati, P. Cerri, and R. de Charette, “Comogan: continuous model-guided image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 288–14 298.
  2. S. R. Richter, H. A. AlHaija, and V. Koltun, “Enhancing photorealism enhancement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1700–1715, 2022.
  3. Z. Jia, B. Yuan, K. Wang, H. Wu, D. Clifford, Z. Yuan, and H. Su, “Semantically robust unpaired image translation for data with unmatched semantics statistics,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 273–14 283.
  4. Z. Hao, A. Mallya, S. Belongie, and M.-Y. Liu, “Gancraft: Unsupervised 3d neural rendering of minecraft worlds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 072–14 082.
  5. J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in International conference on machine learning.   Pmlr, 2018, pp. 1989–1998.
  6. S. Roy, A. Siarohin, E. Sangineto, N. Sebe, and E. Ricci, “Trigan: Image-to-image translation for multi-source domain adaptation,” Machine vision and applications, vol. 32, pp. 1–12, 2021.
  7. L. Jiang, C. Zhang, M. Huang, C. Liu, J. Shi, and C. C. Loy, “Tsit: A simple and versatile framework for image-to-image translation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16.   Springer, 2020, pp. 206–222.
  8. S. Jeong, Y. Kim, E. Lee, and K. Sohn, “Memory-guided unsupervised image-to-image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6558–6567.
  9. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501–1510.
  10. Y. Yao, J. Ren, X. Xie, W. Liu, Y.-J. Liu, and J. Wang, “Attention-aware multi-stroke style transfer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1467–1475.
  11. J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” arXiv preprint arXiv:1907.10830, 2019.
  12. C. Nederhood, N. Kolkin, D. Fu, and J. Salavon, “Harnessing the conditioning sensorium for improved image translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6752–6761.
  13. J. Liang, H. Zeng, and L. Zhang, “High-resolution photorealistic image translation in real-time: A laplacian pyramid translation network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9392–9400.
  14. T. R. Shaham, M. Gharbi, R. Zhang, E. Shechtman, and T. Michaeli, “Spatially-adaptive pixelwise networks for fast image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 882–14 891.
  15. X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 172–189.
  16. C.-T. Lin, Y.-Y. Wu, P.-H. Hsu, and S.-H. Lai, “Multimodal structure-consistent image-to-image translation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 490–11 498.
  17. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  18. S. Benaim and L. Wolf, “One-sided unsupervised domain mapping,” Advances in neural information processing systems, vol. 30, 2017.
  19. H. Fu, M. Gong, C. Wang, K. Batmanghelich, K. Zhang, and D. Tao, “Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2427–2436.
  20. R. Zhang, T. Pfister, and J. Li, “Harmonic unpaired image-to-image translation,” arXiv preprint arXiv:1902.09727, 2019.
  21. Y. Zhao, R. Wu, and H. Dong, “Unpaired image-to-image translation using adversarial consistency loss,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16.   Springer, 2020, pp. 800–815.
  22. M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” Advances in neural information processing systems, vol. 30, 2017.
  23. O. Sendik, D. Cohen-Or, and D. Lischinski, “Crossnet: Latent cross-consistency for unpaired image translation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 3043–3051.
  24. Y. Yang, D. Lao, G. Sundaramoorthi, and S. Soatto, “Phase consistent ecological domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9011–9020.
  25. X. Liang, H. Zhang, L. Lin, and E. Xing, “Generative semantic manipulation with mask-contrasting gan,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 558–573.
  26. J. Theiss, J. Leverett, D. Kim, and A. Prakash, “Unpaired image translation via vector symbolic architectures,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI.   Springer, 2022, pp. 17–32.
  27. C.-C. Kao, Y. Wang, J. Waltman, and P. Sen, “Patch-based image hallucination for super resolution with detail reconstruction from similar sample images,” IEEE Transactions on Multimedia, vol. 22, no. 5, pp. 1139–1152, 2019.
  28. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014.
  29. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
  30. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
  31. T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16.   Springer, 2020, pp. 319–345.
  32. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14.   Springer, 2016, pp. 694–711.
  33. X. Su, J. Song, C. Meng, and S. Ermon, “Dual diffusion implicit bridges for image-to-image translation,” in International Conference on Learning Representations, 2022.
  34. M. Zhao, F. Bao, C. Li, and J. Zhu, “Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations,” arXiv preprint arXiv:2207.06635, 2022.
  35. C. H. Wu and F. De la Torre, “Unifying diffusion models’ latent space, with applications to cyclediffusion and guidance,” arXiv preprint arXiv:2210.05559, 2022.
  36. Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” arXiv preprint arXiv:1611.02200, 2016.
  37. H. Wang, T. Shen, W. Zhang, L.-Y. Duan, and T. Mei, “Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV.   Springer, 2020, pp. 642–659.
  38. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  39. X. Xie, J. Chen, Y. Li, L. Shen, K. Ma, and Y. Zheng, “Self-supervised cyclegan for object-preserving image-to-image domain adaptation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16.   Springer, 2020, pp. 498–513.
  40. L. Ma, X. Jia, S. Georgoulis, T. Tuytelaars, and L. Van Gool, “Exemplar guided unsupervised image-to-image translation with semantic consistency,” arXiv preprint arXiv:1805.11145, 2018.
  41. X. Liu, G. Yin, J. Shao, X. Wang et al., “Learning to predict layout-to-image conditional convolutions for semantic image synthesis,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  42. Y. Alami Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim, “Unsupervised attention-guided image-to-image translation,” Advances in neural information processing systems, vol. 31, 2018.
  43. H. Tang, H. Liu, D. Xu, P. H. Torr, and N. Sebe, “Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks,” IEEE transactions on neural networks and learning systems, 2021.
  44. C. Yang, T. Kim, R. Wang, H. Peng, and C.-C. J. Kuo, “Show, attend, and translate: Unsupervised image translation with self-regularization and attention,” IEEE Transactions on Image Processing, vol. 28, no. 10, pp. 4845–4856, 2019.
  45. L. Zhang, X. Chen, R. Dong, and K. Ma, “Region-aware knowledge distillation for efficient image-to-image translation,” arXiv preprint arXiv:2205.12451, 2022.
  46. H. Tang, S. Bai, and N. Sebe, “Dual attention gans for semantic image synthesis,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1994–2002.
  47. X. Hu, X. Zhou, Q. Huang, Z. Shi, L. Sun, and Q. Li, “Qs-attn: Query-selected attention for contrastive learning in i2i translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 291–18 300.
  48. H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2417–2426.
  49. G. Kwon and J. C. Ye, “Diagonal attention and style-based gan for content-style disentanglement in image generation and translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 980–13 989.
  50. W. Liu, Z. Piao, Z. Tu, W. Luo, L. Ma, and S. Gao, “Liquid warping gan with attention: A unified framework for human image synthesis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5114–5132, 2021.
  51. Y. Lin, Y. Wang, Y. Li, Y. Gao, Z. Wang, and L. Khan, “Attention-based spatial guidance for image-to-image translation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 816–825.
  52. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  53. D. Torbunov, Y. Huang, H. Yu, J. Huang, S. Yoo, M. Lin, B. Viren, and Y. Ren, “Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 702–712.
  54. W. Zheng, Q. Li, G. Zhang, P. Wan, and Z. Wang, “Ittr: Unpaired image-to-image translation with transformers,” arXiv preprint arXiv:2203.16015, 2022.
  55. J. Lambert, Z. Liu, O. Sener, J. Hays, and V. Koltun, “Mseg: A composite dataset for multi-domain semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2879–2888.
  56. T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2337–2346.
  57. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2414–2423.
  58. C. Li and M. Wand, “Combining markov random fields and convolutional neural networks for image synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2479–2486.
  59. Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” arXiv preprint arXiv:1701.01036, 2017.
  60. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
  61. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  62. J. H. Lim and J. C. Ye, “Geometric gan,” arXiv preprint arXiv:1705.02894, 2017.
  63. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.
  64. L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods for gans do actually converge?” in International conference on machine learning.   PMLR, 2018, pp. 3481–3490.
  65. S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14.   Springer, 2016, pp. 102–118.
  66. S. R. Richter, Z. Hayder, and V. Koltun, “Playing for benchmarks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2213–2222.
  67. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
  68. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645.
  69. E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, “Color transfer between images,” IEEE Computer graphics and applications, vol. 21, no. 5, pp. 34–41, 2001.
  70. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  71. M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.

Summary

We haven't generated a summary for this paper yet.