DRAN: Detailed Region-Adaptive Normalization for Conditional Image Synthesis (2109.14525v4)
Abstract: In recent years, conditional image synthesis has attracted growing attention due to its controllability in the image generation process. Although recent works have achieved realistic results, most of them have difficulty handling fine-grained styles with subtle details. To address this problem, a novel normalization module, named Detailed Region-Adaptive Normalization~(DRAN), is proposed. It adaptively learns both fine-grained and coarse-grained style representations. Specifically, we first introduce a multi-level structure, Spatiality-aware Pyramid Pooling, to guide the model to learn coarse-to-fine features. Then, to adaptively fuse different levels of styles, we propose Dynamic Gating, making it possible to adaptively fuse different levels of styles according to different spatial regions. Finally, we collect a new makeup dataset (Makeup-Complex dataset) that contains a wide range of complex makeup styles with diverse poses and expressions. To evaluate the effectiveness and show the general use of our method, we conduct a set of experiments on makeup transfer and semantic image synthesis. Quantitative and qualitative experiments show that equipped with DRAN, simple baseline models are able to achieve promising improvements in complex style transfer and detailed texture synthesis. Both the code and the proposed dataset will be available at https://github.com/Yueming6568/DRAN-makeup.git.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 27, 2014.
- T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” in International Conference on Learning Representations (ICLR), 2018.
- A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in International Conference on Learning Representations (ICLR), 2019.
- T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410.
- Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8789–8797.
- Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Transactions on Image Processing (TIP), vol. 28, no. 11, pp. 5464–5478, 2019.
- T. Chen, S. Wu, X. Yang, Y. Xu, and H.-S. Wong, “Semantic regularized class-conditional gans for semi-supervised fine-grained image synthesis,” IEEE Transactions on Multimedia (TMM), 2021.
- Y. Liu, Y. Chen, L. Bao, N. Sebe, B. Lepri, and M. De Nadai, “Isf-gan: An implicit style function for high-resolution image-to-image translation,” IEEE Transactions on Multimedia (TMM), 2022.
- H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5907–5915.
- S. Hong, D. Yang, J. Choi, and H. Lee, “Inferring semantic layout for hierarchical text-to-image synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7986–7994.
- M. Yuan and Y. Peng, “Ckd: Cross-task knowledge distillation for text-to-image synthesis,” IEEE Transactions on Multimedia (TMM), vol. 22, no. 8, pp. 1955–1968, 2019.
- R. Li, N. Wang, F. Feng, G. Zhang, and X. Wang, “Exploring global and local linguistic representations for text-to-image synthesis,” IEEE Transactions on Multimedia (TMM), vol. 22, no. 12, pp. 3075–3087, 2020.
- T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8798–8807.
- T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2337–2346.
- J. Huang, J. Liao, and S. Kwong, “Semantic example guided image-to-image translation,” IEEE Transactions on Multimedia (TMM), vol. 23, pp. 1654–1665, 2020.
- Q. Deng, Q. Li, J. Cao, Y. Liu, and Z. Sun, “Semantic-aware noise driven portrait synthesis and manipulation,” IEEE Transactions on Multimedia (TMM), 2022.
- T. Li, R. Qian, C. Dong, S. Liu, Q. Yan, W. Zhu, and L. Lin, “Beautygan: Instance-level facial makeup transfer with deep generative adversarial network,” in ACM International Conference on Multimedia (ACM MM), 2018, pp. 645–653.
- H.-J. Chen, K.-M. Hui, S.-Y. Wang, L.-W. Tsao, H.-H. Shuai, and W.-H. Cheng, “Beautyglow: On-demand makeup transfer framework with reversible generative network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10 042–10 050.
- Q. Gu, G. Wang, M. T. Chiu, Y.-W. Tai, and C.-K. Tang, “Ladn: Local adversarial disentangling network for facial makeup and de-makeup,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019, pp. 10 481–10 490.
- Y. Lyu, J. Dong, B. Peng, W. Wang, and T. Tan, “Sogan: 3D-aware shadow and occlusion robust gan for makeup transfer,” in ACM International Conference on Multimedia (ACM MM), 2021, pp. 3601–3609.
- H. Deng, C. Han, H. Cai, G. Han, and S. He, “Spatially-invariant style-codes controlled makeup transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6549–6557.
- P. Zhu, R. Abdal, Y. Qin, and P. Wonka, “Sean: Image synthesis with semantic region-adaptive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5104–5113.
- Z. Zhu, Z. Xu, A. You, and X. Bai, “Semantically multi-modal image synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5467–5476.
- Z. Tan, M. Chai, D. Chen, J. Liao, Q. Chu, B. Liu, G. Hua, and N. Yu, “Diverse semantic image synthesis via probability distribution modeling,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7962–7971.
- Y. Wang, L. Qi, Y.-C. Chen, X. Zhang, and J. Jia, “Image synthesis via semantic composition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 13 749–13 758.
- X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1501–1510.
- X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 172–189.
- X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 606–615.
- H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in International Conference on Machine Learning (ICML), 2019, pp. 7354–7363.
- L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423.
- C. Li and M. Wand, “Combining markov random fields and convolutional neural networks for image synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2479–2486.
- Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” in International Joint Conferences on Artificial Intelligence (IJCAI), 2017, pp. 2230–2236.
- C.-H. Lee, Z. Liu, L. Wu, and P. Luo, “Maskgan: Towards diverse and interactive facial image manipulation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5549–5558.
- Y. Men, Y. Mao, Y. Jiang, W.-Y. Ma, and Z. Lian, “Controllable person image synthesis with attribute-decomposed gan,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5084–5093.
- J. Ling, H. Xue, L. Song, R. Xie, and X. Gu, “Region-aware adaptive instance normalization for image harmonization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9361–9370.
- Z. Lv, X. Li, X. Li, F. Li, T. Lin, D. He, and W. Zuo, “Learning semantic person image generation by region-adaptive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 806–10 815.
- T. Yu, Z. Guo, X. Jin, S. Wu, Z. Chen, W. Li, Z. Zhang, and S. Liu, “Region normalization for image inpainting,” in Thirty-fourth AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 07, 2020, pp. 12 733–12 740.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2223–2232.
- H. Chang, J. Lu, F. Yu, and A. Finkelstein, “Pairedcyclegan: Asymmetric style transfer for applying and removing makeup,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 40–48.
- D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018.
- W. Jiang, S. Liu, C. Gao, J. Cao, R. He, J. Feng, and S. Yan, “Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5194–5202.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1125–1134.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning (ICML), 2015, pp. 448–456.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016.
- J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
- Y. Wu and K. He, “Group normalization,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
- Z. Tan, D. Chen, Q. Chu, M. Chai, J. Liao, M. He, L. Yuan, G. Hua, and N. Yu, “Efficient semantic image synthesis via class-adaptive normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 37, no. 9, pp. 1904–1916, 2015.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 694–711.
- Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3730–3738.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
- D. Organisciak, E. S. Ho, and H. P. Shum, “Makeup style transfer on low-quality images with weighted multi-scale attention,” in International Conference on Pattern Recognition (ICPR), 2021.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 586–595.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing (TIP), vol. 13, no. 4, pp. 600–612, 2004.
- Y. Zhong, W. Deng, J. Hu, D. Zhao, X. Li, and D. Wen, “Sface: Sigmoid-constrained hypersphere loss for robust face recognition,” IEEE Transactions on Image Processing (TIP), vol. 30, pp. 2587–2598, 2021.