Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Controlling the Output of a Generative Model by Latent Feature Vector Shifting (2311.08850v2)

Published 15 Nov 2023 in cs.CV

Abstract: State-of-the-art generative models (e.g. StyleGAN3 \cite{karras2021alias}) often generate photorealistic images based on vectors sampled from their latent space. However, the ability to control the output is limited. Here we present our novel method for latent vector shifting for controlled output image modification utilizing semantic features of the generated images. In our approach we use a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution. We complement the generative model with a convolutional neural network classifier, namely ResNet34, trained to classify the generated images with binary facial features from the CelebA dataset. Our latent feature shifter is a neural network model with a task to shift the latent vectors of a generative model into a specified feature direction. We have trained latent feature shifter for multiple facial features, and outperformed our baseline method in the number of generated images with the desired feature. To train our latent feature shifter neural network, we have designed a dataset of pairs of latent vectors with and without a certain feature. Based on the evaluation, we conclude that our latent feature shifter approach was successful in the controlled generation of the StyleGAN3 generator.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  2. Oldest cave art found in sulawesi. Science Advances, 7(3):eabd4648, 2021.
  3. David S Whitley. Introduction to rock art research. Routledge, 2016.
  4. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  5. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  6. Styleswin: Transformer-based gan for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11304–11314, 2022.
  7. Projected gans converge faster. Advances in Neural Information Processing Systems, 34:17480–17492, 2021.
  8. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.
  9. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 Conference Proceedings, SIGGRAPH ’22, New York, NY, USA, 2022. Association for Computing Machinery.
  10. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  11. In-domain gan inversion for real image editing. In European conference on computer vision, pages 592–608. Springer, 2020.
  12. Sequential attention gan for interactive image editing. In Proceedings of the 28th ACM International Conference on Multimedia, pages 4383–4391, 2020.
  13. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research, 2022.
  14. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  15. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
  16. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019, 2022.
  17. Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675, 2023.
  18. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9243–9252, 2020.
  19. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
  20. Latent space oddity: on the curvature of deep generative models. In International Conference on Learning Representations, 2018.
  21. The riemannian geometry of deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 315–323, 2018.
  22. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning, pages 9786–9796. PMLR, 2020.
  23. Clustergan: Latent space clustering in generative adversarial networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4610–4617, 2019.
  24. Neuralfusion: Online depth fusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3162–3172, 2021.
  25. Gan dissection: Visualizing and understanding generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  26. Network bending: Expressive manipulation of deep generative models. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pages 20–36. Springer, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.