Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatial Steerability of GANs via Self-Supervision from Discriminator (2301.08455v2)

Published 20 Jan 2023 in cs.CV

Abstract: Generative models make huge progress to the photorealistic image synthesis in recent years. To enable human to steer the image generation process and customize the output, many works explore the interpretable dimensions of the latent space in GANs. Existing methods edit the attributes of the output image such as orientation or color scheme by varying the latent code along certain directions. However, these methods usually require additional human annotations for each pretrained model, and they mostly focus on editing global attributes. In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. Along with training the GAN model from scratch, these heatmaps are being aligned with the emerging attention of the GAN's discriminator in a self-supervised learning manner. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects. Moreover, we incorporate DragGAN into our framework, which facilitates fine-grained manipulation within a reasonable time and supports a coarse-to-fine editing process. Extensive experiments show that the proposed method not only enables spatial editing over human faces, animal faces, outdoor scenes, and complicated multi-object indoor scenes but also brings improvement in synthesis quality. Code, models, and demo video are available at https://genforce.github.io/SpatialGAN/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Adv. Neural Inform. Process. Syst., 2014.
  2. A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in Int. Conf. Learn. Represent., 2016.
  3. A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in Int. Conf. Learn. Represent., 2019.
  4. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 4401–4410.
  5. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 8110–8119.
  6. Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the latent space of gans for semantic face editing,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020.
  7. C. Yang, Y. Shen, and B. Zhou, “Semantic hierarchy emerges in deep generative representations for scene synthesis,” Int. J. Comput. Vis., 2021.
  8. Y. Shen and B. Zhou, “Closed-form factorization of latent semantics in gans,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  9. L. Goetschalckx, A. Andonian, A. Oliva, and P. Isola, “Ganalyze: Toward visual definitions of cognitive image properties,” in Int. Conf. Comput. Vis., 2019, pp. 5744–5753.
  10. A. Voynov and A. Babenko, “Unsupervised discovery of interpretable directions in the gan latent space,” in Int. Conf. Mach. Learn., 2020, pp. 9786–9796.
  11. C. Yang, Y. Shen, and B. Zhou, “Semantic hierarchy emerges in deep generative representations for scene synthesis,” Int. J. Comput. Vis., pp. 1451–1466, 2021.
  12. E. Härkönen, A. Hertzmann, J. Lehtinen, and S. Paris, “Ganspace: Discovering interpretable gan controls,” in Adv. Neural Inform. Process. Syst., 2020, pp. 9841–9850.
  13. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 2921–2929.
  14. J. Wang, C. Yang, Y. Xu, Y. Shen, H. Li, and B. Zhou, “Improving gan equilibrium by raising spatial awareness,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022.
  15. X. Pan, A. Tewari, T. Leimkühler, L. Liu, A. Meka, and C. Theobalt, “Drag your gan: Interactive point-based manipulation on the generative image manifold,” in ACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–11.
  16. E. L. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” in Adv. Neural Inform. Process. Syst., 2015.
  17. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” in Int. Conf. Learn. Represent., 2018.
  18. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Int. Conf. Learn. Represent., 2018.
  19. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Int. Conf. Mach. Learn., 2019, pp. 7354–7363.
  20. J. Donahue and K. Simonyan, “Large scale adversarial representation learning,” in Adv. Neural Inform. Process. Syst., 2019.
  21. E. Schonfeld, B. Schiele, and A. Khoreva, “A u-net based discriminator for generative adversarial networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020.
  22. M. Niemeyer and A. Geiger, “Giraffe: Representing scenes as compositional generative neural feature fields,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  23. A. Casanova, M. Careil, J. Verbeek, M. Drozdzal, and A. Romero Soriano, “Instance-conditioned gan,” in Adv. Neural Inform. Process. Syst., 2021.
  24. A. Plumerault, H. L. Borgne, and C. Hudelot, “Controlling generative models with continuous factors of variations,” in Int. Conf. Learn. Represent., 2020.
  25. A. Jahanian*, L. Chai*, and P. Isola, “On the ”steerability” of generative adversarial networks,” in Int. Conf. Learn. Represent., 2020.
  26. J. Zhu, Y. Shen, Y. Xu, D. Zhao, and Q. Chen, “Region-based semantic factorization in GANs,” in Int. Conf. Mach. Learn., 2022.
  27. D. Epstein, T. Park, R. Zhang, E. Shechtman, and A. A. Efros, “Blobgan: Spatially disentangled scene representations,” 2022.
  28. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Int. Conf. Comput. Vis., 2017, pp. 618–626.
  29. X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315–323.
  30. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Int. Conf. Comput. Vis., 2017, pp. 1501–1510.
  31. T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 2337–2346.
  32. F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
  33. T. A. Tero Karras, Samuli Laine, “Flickr-faces-hq dataset (ffhq).” [Online]. Available: https://github.com/NVlabs/ffhq-dataset
  34. T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,” in Adv. Neural Inform. Process. Syst., 2020, pp. 12 104–12 114.
  35. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Adv. Neural Inform. Process. Syst., 2017.
  36. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 2961–2969.
  37. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Int. Conf. Learn. Represent., 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jianyuan Wang (24 papers)
  2. Lalit Bhagat (1 paper)
  3. Ceyuan Yang (51 papers)
  4. Yinghao Xu (57 papers)
  5. Yujun Shen (111 papers)
  6. Hongdong Li (172 papers)
  7. Bolei Zhou (134 papers)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub