Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Preconditioned Score-based Generative Models (2302.06504v2)

Published 13 Feb 2023 in cs.CV

Abstract: Score-based generative models (SGMs) have recently emerged as a promising class of generative models. However, a fundamental limitation is that their sampling process is slow due to a need for many (\eg, $2000$) iterations of sequential computations. An intuitive acceleration method is to reduce the sampling iterations which however causes severe performance degradation. We assault this problem to the ill-conditioned issues of the Langevin dynamics and reverse diffusion in the sampling process. Under this insight, we propose a model-agnostic {\bf\em preconditioned diffusion sampling} (PDS) method that leverages matrix preconditioning to alleviate the aforementioned problem. PDS alters the sampling process of a vanilla SGM at marginal extra computation cost, and without model retraining. Theoretically, we prove that PDS preserves the output distribution of the SGM, no risk of inducing systematical bias to the original sampling process. We further theoretically reveal a relation between the parameter of PDS and the sampling iterations,easing the parameter estimation under varying sampling iterations. Extensive experiments on various image datasets with a variety of resolutions and diversity validate that our PDS consistently accelerates off-the-shelf SGMs whilst maintaining the synthesis quality. In particular, PDS can accelerate by up to $29\times$ on more challenging high resolution (1024$\times$1024) image generation. Compared with the latest generative models (\eg, CLD-SGM, DDIM, and Analytic-DDIM), PDS can achieve the best sampling quality on CIFAR-10 at a FID score of 1.99. Our code is made publicly available to foster any further research https://github.com/fudan-zvg/PDS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014.
  2. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” in Advances in Neural Information Processing Systems, 2019.
  3. ——, “Improved techniques for training score-based generative models,” in Advances in Neural Information Processing Systems, 2020.
  4. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.
  5. Y. Song, C. Durkan, I. Murray, and S. Ermon, “Maximum likelihood training of score-based diffusion models,” in Advances in Neural Information Processing Systems, 2021.
  6. Z. Xiao, K. Kreis, and A. Vahdat, “Tackling the generative learning trilemma with denoising diffusion gans,” in International Conference on Learning Representations, 2022.
  7. V. De Bortoli, J. Thornton, J. Heng, and A. Doucet, “Diffusion schrödinger bridge with applications to score-based generative modeling,” in Advances in Neural Information Processing Systems, 2021.
  8. L. Yang, Z. Huang, Y. Song, S. Hong, G. Li, W. Zhang, B. Cui, B. Ghanem, and M.-H. Yang, “Diffusion-based scene graph to image generation with masked contrastive pre-training,” arXiv preprint arXiv:2211.11138, 2022.
  9. Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “Diffwave: A versatile diffusion model for audio synthesis,” in International Conference on Learning Representations, 2021.
  10. N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “Wavegrad: Estimating gradients for waveform generation,” in International Conference on Learning Representations, 2021.
  11. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv preprint, 2021.
  12. C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes et al., “Photorealistic text-to-image diffusion models with deep language understanding,” arXiv preprint, 2022.
  13. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  14. C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “Sdedit: Guided image synthesis and editing with stochastic differential equations,” in International Conference on Learning Representations, 2021.
  15. C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” arXiv preprint, 2021.
  16. C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,” arXiv preprint, 2021.
  17. B. Kawar, M. Elad, S. Ermon, and J. Song, “Denoising diffusion restoration models,” arXiv preprint, 2022.
  18. J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” arXiv preprint arXiv:2204.03458, 2022.
  19. L. Zhou, Y. Du, and J. Wu, “3d shape generation and completion through point-voxel diffusion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5826–5835.
  20. A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” in International Conference on Learning Representations, 2019.
  21. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  22. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” in International Conference on Learning Representations, 2018.
  23. G. O. Roberts and O. Stramer, “Langevin diffusions and metropolis-hastings algorithms,” Methodology and computing in applied probability, 2002.
  24. M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient langevin dynamics,” in International Conference on Machine Learning, 2011.
  25. M. Girolami and B. Calderhead, “Riemann manifold langevin and hamiltonian monte carlo methods,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011.
  26. C. Li, C. Chen, D. Carlson, and L. Carin, “Preconditioned stochastic gradient langevin dynamics for deep neural networks,” in AAAI Conference on Artificial Intelligence, 2016.
  27. T. Dockhorn, A. Vahdat, and K. Kreis, “Score-based generative modeling with critically-damped langevin diffusion,” in International Conference on Learning Representations, 2022.
  28. J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in International Conference on Learning Representations, 2020.
  29. F. Bao, C. Li, J. Zhu, and B. Zhang, “Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models,” International Conference on Learning Representations, 2022.
  30. H. Ma, L. Zhang, X. Zhu, and J. Feng, “Accelerating score-based generative models with preconditioned diffusion sampling,” in European Conference on Computer Vision, 2022.
  31. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning, 2015.
  32. A. Vahdat, K. Kreis, and J. Kautz, “Score-based generative modeling in latent space,” in Advances in Neural Information Processing Systems, 2021.
  33. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, 2020.
  34. P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” in Advances in Neural Information Processing Systems, 2021.
  35. J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,” arXiv preprint, 2021.
  36. L. Liu, Y. Ren, Z. Lin, and Z. Zhao, “Pseudo numerical methods for diffusion models on manifolds,” in International Conference on Learning Representations, 2021.
  37. D. Kingma, T. Salimans, B. Poole, and J. Ho, “Variational diffusion models,” in Advances in Neural Information Processing Systems, 2021.
  38. R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, 2011.
  39. A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, “Gotta go fast when generating data with score-based models,” arXiv preprint arXiv, 2021.
  40. T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,” arXiv preprint arXiv:2202.00512, 2022.
  41. C. Meng, R. Rombach, R. Gao, D. Kingma, S. Ermon, J. Ho, and T. Salimans, “On distillation of guided diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 297–14 306.
  42. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” 2023.
  43. B. Jing, G. Corso, R. Berlinghieri, and T. Jaakkola, “Subspace diffusion generative models,” arXiv preprint, 2022.
  44. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning, 2021.
  45. F. Bao, C. Li, J. Sun, J. Zhu, and B. Zhang, “Estimating the optimal covariance with imperfect mean in diffusion probabilistic models,” 2022.
  46. A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching.” Journal of Machine Learning Research, 2005.
  47. P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  48. U. G. Haussmann and E. Pardoux, “Time reversal of diffusions,” The Annals of Probability, pp. 1188–1205, 1986.
  49. S. Wright, J. Nocedal et al., “Numerical optimization,” Springer Science, vol. 35, no. 67-68, p. 7, 1999.
  50. A. van der Schaaf and J. H. van Hateren, “Modelling the power spectra of natural images: Statistics and information,” Vision Research, 1996.
  51. K. Friston, “The free-energy principle: a unified brain theory?” Nature reviews neuroscience, vol. 11, no. 2, pp. 127–138, 2010.
  52. C.-R. Hwang, S.-Y. Hwang-Ma, and S.-J. Sheu, “Accelerating diffusions,” The Annals of Applied Probability, 2005.
  53. M. Ottobre, “Markov chain monte carlo and irreversibility,” Reports on Mathematical Physics, 2016.
  54. L. Rey-Bellet and K. Spiliopoulos, “Irreversible langevin samplers and variance reduction: a large deviations approach,” Nonlinearity, 2015.
  55. T. Lelievre, F. Nier, and G. A. Pavliotis, “Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion,” Journal of Statistical Physics, 2013.
  56. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  57. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in IEEE International Conference on Computer Vision, 2015.
  58. F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint, 2015.
  59. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Advances in Neural Information Processing Systems, 2017.
  60. A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in International Conference on Learning Representations, 2018.
  61. M. Kang, W. Shim, M. Cho, and J. Park, “Rebooting acgan: Auxiliary classifier gans with stable training,” Advances in Neural Information Processing Systems, vol. 34, pp. 23 505–23 518, 2021.
  62. D. Kim, B. Na, S. J. Kwon, D. Lee, W. Kang, and I.-C. Moon, “Maximum likelihood training of implicit nonlinear diffusion models,” arXiv preprint, 2022.
  63. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  64. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
Citations (4)

Summary

We haven't generated a summary for this paper yet.