Papers
Topics
Authors
Recent
2000 character limit reached

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models (2310.09912v3)

Published 15 Oct 2023 in cs.CV and cs.LG

Abstract: We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. Specifically, we employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Latent space oddity: on the curvature of deep generative models. arXiv preprint arXiv:1710.11379, 2017.
  2. Geometrically enriched latent spaces. arXiv preprint arXiv:2008.00565, 2020.
  3. Metrics for deep generative models. In International Conference on Artificial Intelligence and Statistics, pages 1540–1550. PMLR, 2018.
  4. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  5. Perception prioritized training of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022.
  6. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018.
  7. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  8. William Feller. On the theory of stochastic processes, with particular reference to applications. In Proceedings of the [First] Berkeley Symposium on Mathematical Statistics and Probability, pages 403–433. University of California Press, 1949.
  9. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. Memory-efficient backpropagation through time. Advances in neural information processing systems, 29, 2016.
  12. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems, 33:9841–9850, 2020.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  14. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
  15. Towards the automatic anime characters creation with generative adversarial networks. arXiv preprint arXiv:1708.05509, 2017.
  16. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  18. Refining generative process with discriminator guidance in score-based diffusion models. arXiv preprint arXiv:2211.17091, 2022a.
  19. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022b.
  20. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  21. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022.
  22. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  23. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192, 2023.
  24. Unsupervised discovery of semantic latent directions in diffusion models. arXiv preprint arXiv:2302.12469, 2023.
  25. Diffusion autoencoders: Toward a meaningful and decodable representation. arXiv preprint arXiv:2111.15640, 2021.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  27. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pages 1278–1286. PMLR, 2014.
  28. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  29. Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
  30. Generating high fidelity data from low-density regions using diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11492–11501, 2022.
  31. The riemannian geometry of deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 315–323, 2018.
  32. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1532–1540, 2021.
  33. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  34. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  35. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning, pages 9786–9796. PMLR, 2020.
  36. Chatface: Chat-guided real face editing via diffusion latent space manipulation. arXiv preprint arXiv:2305.14742, 2023.
  37. Latentclr: A contrastive learning approach for unsupervised discovery of interpretable directions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14263–14272, 2021.
  38. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  39. Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in Neural Information Processing Systems, 35:22117–22130, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.