Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models (2403.01639v1)
Abstract: Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. Such information is coined as guidance. For example, in text-to-image synthesis, text input is encoded as guidance to generate semantically aligned images. Proper guidance inputs are closely tied to the performance of diffusion models. A common observation is that strong guidance promotes a tight alignment to the task-specific information, while reducing the diversity of the generated samples. In this paper, we provide the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models. Under mild conditions, we prove that incorporating diffusion guidance not only boosts classification confidence but also diminishes distribution diversity, leading to a reduction in the differential entropy of the output distribution. Our analysis covers the widely adopted sampling schemes including DDPM and DDIM, and leverages comparison inequalities for differential equations as well as the Fokker-Planck equation that characterizes the evolution of probability density function, which may be of independent theoretical interest.
- Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
- Generative modeling with denoising auto-encoders and langevin sampling. arXiv preprint arXiv:2002.00107, 2020.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Émile Borel. Leçons sur la théorie des fonctions. Gauthier-Villars et fils, 1928.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. arXiv preprint arXiv:2211.01916, 2022a.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. arXiv preprint arXiv:2302.07194, 2023a.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv preprint arXiv:2209.11215, 2022b.
- Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for DDIM-type samplers. arXiv preprint arXiv:2303.03384, 2023b.
- Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
- Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709, 2021.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Adriaan Daniël Fokker. Die mittlere energie rotierender elektrischer dipole im strahlungsfeld. Annalen der Physik, 348(5):810–820, 1914.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985, 2023.
- Towards faster non-asymptotic convergence for diffusion-based generative models. arXiv preprint arXiv:2306.09251, 2023.
- Accelerating convergence of score-based diffusion models, provably. 2024a.
- Towards a mathematical theory for consistency training in diffusion models. arXiv preprint arXiv:2402.07802, 2024b.
- Let us build bridges: Understanding and extending diffusion generative models. arXiv preprint arXiv:2208.14699, 2022.
- High-fidelity image generation with fewer labels. In International conference on machine learning, pages 4183–4192. PMLR, 2019.
- Alex McNabb. Comparison theorems for differential equations. Journal of mathematical analysis and applications, 119(1-2):417–428, 1986.
- Deep networks as denoising algorithms: Sample-efficient learning of diffusion models in high-dimensional graphical models. arXiv preprint arXiv:2309.11420, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Jakiw Pidstrigach. Score-based generative models detect manifolds. arXiv preprint arXiv:2206.01018, 2022.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Contractive diffusion probabilistic models. arXiv preprint arXiv:2401.13115, 2024.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
- Xuehong Zhu. On the comparison theorem for multidimensional sdes with jumps. arXiv preprint arXiv:1006.1454, 2010.