Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs (2305.03935v4)
Abstract: Diffusion models have exhibited excellent performance in various domains. The probability flow ordinary differential equation (ODE) of diffusion models (i.e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation. However, the likelihood estimation results by diffusion ODEs are still far from those of the state-of-the-art likelihood-based generative models. In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. For training, we propose velocity parameterization and explore variance reduction techniques for faster convergence. We also derive an error-bounded high-order flow matching objective for finetuning, which improves the ODE likelihood and smooths its trajectory. For evaluation, we propose a novel training-free truncated-normal dequantization to fill the training-evaluation gap commonly existing in diffusion ODEs. Building upon these techniques, we achieve state-of-the-art likelihood estimation results on image datasets (2.56 on CIFAR-10, 3.43/3.69 on ImageNet-32) without variational dequantization or data augmentation, and 2.42 on CIFAR-10 with data augmentation. Code is available at \url{https://github.com/thu-ml/i-DODE}.
- Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2022.
- Anderson, B. D. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Jax: composable transformations of python+ numpy programs. Version 0.2, 5:14–24, 2018.
- Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015.
- Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations, 2021.
- Neural ordinary differential equations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6572–6583, 2018a.
- Pixelsnail: An improved autoregressive generative model. In International Conference on Machine Learning, pp. 864–872. PMLR, 2018b.
- Autoencoder-based network anomaly detection. In 2018 Wireless telecommunications symposium (WTS), pp. 1–5. IEEE, 2018c.
- Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics, pp. 2552–2573. PMLR, 2022.
- Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2022.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, 2009.
- Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, volume 34, pp. 8780–8794, 2021.
- Anomaly detection in trajectory data with normalizing flows. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, 2020.
- Density estimation using real nvp. In International Conference on Learning Representations, 2017.
- A family of embedded Runge-Kutta formulae. Journal of computational and applied mathematics, 6(1):19–26, 1980.
- How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning, pp. 3154–3164. PMLR, 2020.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019.
- Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486, 2020.
- Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, pp. 2722–2730. PMLR, 2019.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Anfic: Image compression using augmented normalizing flows. IEEE Open Journal of Circuits and Systems, 2:613–626, 2021.
- A variational perspective on diffusion-based generative models and score matching. In Advances in Neural Information Processing Systems, 2021.
- Hutchinson, M. F. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433–450, 1990.
- Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, 2022.
- Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
- Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. In International Conference on Machine Learning, pp. 11201–11228. PMLR, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Glow: generative flow with invertible 1×\times× 1 convolutions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 10236–10245, 2018.
- Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
- Variational diffusion models. In Advances in Neural Information Processing Systems, 2021.
- Learning multiple layers of features from tiny images. 2009.
- Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.
- Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 11020–11028, 2022a.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2022b.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Maximum likelihood training for score-based diffusion odes by high order denoising score matching. In International Conference on Machine Learning, pp. 14429–14460. PMLR, 2022a.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, 2022b.
- SDEdit: Image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pp. 16784–16804. PMLR, 2022.
- Conditional image generation with pixelcnn decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4797–4805, 2016.
- Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. In International Conference on Learning Representations, 2017.
- Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
- Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, pp. 11895–11907, 2019.
- Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pp. 574–584. PMLR, 2020.
- Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, volume 34, pp. 1415–1428, 2021b.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021c.
- RNADE: The real-valued neural autoregressive density-estimator. Advances in Neural Information Processing Systems, 26, 2013.
- Nvae: a deep hierarchical variational autoencoder. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 19667–19679, 2020.
- Vincent, P. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- Likelihood regret: an out-of-distribution detection score for variational auto-encoder. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 20685–20696, 2020.
- Poisson flow generative models. In Advances in Neural Information Processing Systems, 2022.
- Pfgm++: Unlocking the potential of physics-inspired generative models. arXiv preprint arXiv:2302.04265, 2023.
- Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2022.
- Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. In Advances in Neural Information Processing Systems, 2022.