Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient (2401.11261v2)
Abstract: Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.
- Sohl-Dickstein J., Weiss E., Maheswaranathan N., Ganguli S., Deep unsupervised learning using nonequilibrium thermodynamics. International conference on machine learning, pp.2256-2265 PMLR. 2015.
- Ho J., Jain A., Abbeel P., Denoising diffusion probabilistic models. NeurIPS, 2020.
- Chen N., Zhang Y., Zen H., Weiss R., Norouzi M., Chan W., Wavegrad: Estimating gradients for waveform generation. ICLR, 2021, OpenReview.net.
- Kingma D., Salimans T., Poole B., Ho J., Variational diffusion models. CoRR, abs/2107.00630, 2021.
- Dhariwal P., Nichol A., Diffusion models beat gans on image synthesis, Advances in neural information processing systems,34 , pp.8780-8794, 2021.
- Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M. and Salimans, T., Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1), pp.2249-2281, 2022.
- Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J. and Norouzi, M., Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), pp.4713-4726, 2022.
- Song Y., Ermon S., Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Nichol AQ, Dhariwal P., Improved denoising diffusion probabilistic models. International Conference on Machine Learning, PMLR, pp.8162-8171, 2021.
- Song Y, Ermon S., Improved techniques for training score-based generative models. Advances in neural information processing systems, 33, pp.12438-12448, 2020.
- Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S. and Poole, B., Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456., 2020.
- Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B., High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684-10695. 2022.
- Kong, Z., Ping, W., Huang, J., Zhao, K. and Catanzaro, B., Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
- Mittal, G., Engel, J., Hawthorne, C. and Simon, I., Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
- Ronneberger, O., Fischer, P. and Brox, T., U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, pp. 234-241, Springer International Publishing, 2015.
- Kingma D., Welling M., Auto-encoding variational bayes in 2nd international conference on learning representations, ICLR., 2014.
- C. Améndola, M. Drton, B. Sturmfels, Maximum likelihood estimates for Gaussian mixtures are transcendental, MACIS., (2015), 579–590.
- R. Abbi, E. El-Darzi, C. Vasilakis, P. Millard, Analysis of stopping criteria for the EM algorithm in the context of patient grouping according to length of stay, IEEE Intell Syst., 1(2008), 3-9.
- C. Biernackia, G. Celeuxb, G. Govaertc, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput. Stat. Data. Anal., 41(2003), 561-575.
- J. Blömer, K. Bujna, Simple methods for initializing the em algorithm for Gaussian mixture models, CoRR., 2013.
- J. Chi, Y. Zhang, S. Balakrishnan, M. Wainwright, M. Jordan, Local maxima in the likelihood of Gaussian mixture models: Structural results and algorithmic consequences, NIPS., 29(2016).
- G. Kontaxakis, G. Tzanakos, Study of the convergence properties of the EM algorithm-a new stopping rule, IEEE NSS/MIC., (1992), 1163-1165.
- G. Kontaxakis, G. Tzanakos, Further study of a stopping rule for the EM algorithm, NEBEC., (1993), 52-53.
- W. Kwedlo, A new method for random initialization of the EM algorithm for multivariate Gaussian mixture learning, CORES., (2013), 81-90.
- P. McKenzie, M. Alder, Initializing the EM algorithm for use in Gaussian mixture modelling, Pattern Recognit., (1994), pp. 91-105.
- P. Paclík, J. Novovičová, A new method for random initialization of the EM algorithm for multivariate Gaussian mixture learning, ANNs/GAs, (2001), pp. 406-409.
- E. Shireman, D. Steinley, M. Brusco, Examining the effect of initialization strategies on the performance of Gaussian mixture modeling, Behav. Res. Methods., 49(2017), 1:282-293.
- N. Srebro, Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation?, COLT., (2007), 628–629.
- Lu W., Ding D., Wu F.,Yuan G., An efficient Gaussian mixture model and its application to neural network. Preprint:202302.0275.v2, 2023.
- Lu W., Wu X., Ding D., Yuan G., An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network. arXiv preprint arXiv:2308.09444, 2023.
- Devlin J., Chang M., Lee K., Toutanova K., BERT: pre-training of deep bidirec- tional transformers for language understanding. In Proceedings of naacL-HLT (Vol. 1, p. 2)., 2019.
- Liu, Z., Luo P., and Wang X.; Tang, X., Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV), 2015.
- Krizhevsky, A. and Hinton, G., Learning multiple layers of features from tiny images. 2009.
- Dumoulin, V., Shlens, J. and Kudlur, M., A learned representation for artistic style. arXiv preprint arXiv:1610.07629., 2017.
- De Vries, H., Strub, F., Mary, J., Larochelle, H., Pietquin, O. and Courville, A.C., Modulating early visual processing by language. Advances in Neural Information Processing Systems, 30., 2017.
- Miyato, T. and Koyama, M., cGANs with projection discriminator. arXiv preprint arXiv:1802.05637., 2018.
- Lucic, M., Tschannen, M., Ritter, M., Zhai, X., Bachem, O. and Gelly, S., High-fidelity image generation with fewer labels. In International conference on machine learning (pp. 4183-4192). PMLR., 2019.
- Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M. and Afzal, M.Z., Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412., 2017.
- Lang, O., Gandelsman, Y., Yarom, M., Wald, Y., Elidan, G., Hassidim, A., Freeman, W.T., Isola, P., Globerson, A., Irani, M. and Mosseri, I., Explaining in style: Training a gan to explain a classifier in stylespace. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 693-702)., 2021.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, ACM 63 11:139-144, 2020, .
- Wu, Y., Donahue, J., Balduzzi, D., Simonyan, K. and Lillicrap, T., Logan: Latent optimisation for generative adversarial networks. arXiv preprint arXiv:1912.00953., 2019
- Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J. and Aila, T., Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110-8119)., 2020
- Brock, A., Donahue, J. and Simonyan, K., Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096., 2018
- Arjovsky, M., Chintala, S. and Bottou, L., Wasserstein generative adversarial networks. In International conference on machine learning (pp. 214-223). PMLR., 2017
- Arjovsky, M. and Bottou, L., Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862., 2017
- A. Razavi, A. Van den Oord, O. Vinyals, Generating diverse high-fidelity images with vq-vae-2, Adv. Neural Inf. Process. Syst. 32, 2019.
- Van Den Oord, A. and Vinyals, O., Neural discrete representation learning. Advances in neural information processing systems, 30., 2017.
- Mirza, M. and Osindero, S., Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784., 2014.
- Villani, C., Optimal transport: old and new. Berlin: springer., 2009.
- Nachmani, E., Roman, R.S. and Wolf, L., Non gaussian denoising diffusion models. arXiv preprint arXiv:2106.07582., 2021.
- Kolouri, S., Rohde, G.K. and Hoffmann, H., Sliced wasserstein distance for learning gaussian mixture models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3427-3436)., 2018.
- Weiguo Lu (28 papers)
- Xuan Wu (59 papers)
- Deng Ding (4 papers)
- Jinqiao Duan (174 papers)
- Jirong Zhuang (2 papers)
- Gangnan Yuan (5 papers)