Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing (2312.08256v1)

Published 13 Dec 2023 in cs.CV

Abstract: Recent advances in the field of generative models and in particular generative adversarial networks (GANs) have lead to substantial progress for controlled image editing, especially compared with the pre-deep learning era. Despite their powerful ability to apply realistic modifications to an image, these methods often lack properties like disentanglement (the capacity to edit attributes independently). In this paper, we propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space, and furthermore that the latent axes are decorrelated, encouraging disentanglement. We work in a compressed version of the latent space, using Principal Component Analysis, meaning that the parameter complexity of our autoencoder is reduced, leading to short training times ($\sim$ 45 mins). Qualitative and quantitative results demonstrate the editing capabilities of our approach, with greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity. Our autoencoder architecture simple and straightforward, facilitating implementation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  2. Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
  3. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  4. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  5. Large scale GAN training for high fidelity natural image synthesis. CoRR, abs/1809.11096, 2018.
  6. A style-based generator architecture for generative adversarial networks. CoRR, abs/1812.04948, 2018.
  7. Analyzing and improving the image quality of stylegan. CoRR, abs/1912.04958, 2019.
  8. Ganspace: Discovering interpretable GAN controls. CoRR, abs/2004.02546, 2020.
  9. Interfacegan: Interpreting the disentangled face representation learned by gans. CoRR, abs/2005.09635, 2020.
  10. A latent transformer for disentangled and identity-preserving face editing. CoRR, abs/2106.11895, 2021.
  11. Image2stylegan: How to embed images into the stylegan latent space? CoRR, abs/1904.03189, 2019.
  12. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. CoRR, abs/2008.02401, 2020.
  13. Latent to latent: A learned mapper for identity preserving editing of multiple face attributes in stylegan-generated images. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3677–3685, 2022.
  14. Image2stylegan++: How to edit the embedded images? CoRR, abs/1911.11544, 2019.
  15. Stylespace analysis: Disentangled controls for stylegan image generation. CoRR, abs/2011.12799, 2020.
  16. Designing an encoder for stylegan image manipulation. CoRR, abs/2102.02766, 2021.
  17. Encoding in style: a stylegan encoder for image-to-image translation. CoRR, abs/2008.00951, 2020.
  18. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
  19. Efficientnet: Rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946, 2019.
  20. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  21. Adam Geitgey. face_recognition python package. https://github.com/ageitgey/face_recognition, 2021.
  22. Gans trained by a two time-scale update rule converge to a nash equilibrium. CoRR, abs/1706.08500, 2017.

Summary

We haven't generated a summary for this paper yet.