Directly Denoising Diffusion Models (2405.13540v2)
Abstract: In this paper, we present the Directly Denoising Diffusion Model (DDDM): a simple and generic approach for generating realistic images with few-step sampling, while multistep sampling is still preserved for better performance. DDDMs require no delicately designed samplers nor distillation on pre-trained distillation models. DDDMs train the diffusion model conditioned on an estimated target that was generated from previous training iterations of its own. To generate images, samples generated from the previous time step are also taken into consideration, guiding the generation process iteratively. We further propose Pseudo-LPIPS, a novel metric loss that is more robust to various values of hyperparameter. Despite its simplicity, the proposed approach can achieve strong performance in benchmark datasets. Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models. By extending the sampling to 1000 steps, we further reduce FID score to 1.79, aligning with state-of-the-art methods in the literature. For ImageNet 64x64, our approach stands as a competitive contender against leading models.
- Tract: Denoising diffusion models with transitive closure time-distillation. arXiv preprint arXiv:2303.04248, 2023.
- Boos, D. D. A converse to scheffe’s theorem. The Annals of Statistics, pp. 423–427, 1985.
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- Boot: Data-free distillation of denoising diffusion models with bootstrapping. arXiv preprint arXiv:2306.05544, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022. URL https://openreview.net/forum?id=BBelR2NdDZ5.
- Scalable adaptive computation for iterative generation. arXiv preprint arXiv:2212.11972, 2022.
- Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080, 2021.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8110–8119, 2020a.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8110–8119, 2020b.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Refining generative process with discriminator guidance in score-based diffusion models. arXiv preprint arXiv:2211.17091, 2022.
- Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279, 2023.
- Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
- On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
- Learning multiple layers of features from tiny images. 2009.
- Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
- Bilateral denoising diffusion models. arXiv preprint arXiv:2108.11514, 2021.
- Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. arXiv preprint arXiv:2305.18455, 2023.
- Non gaussian denoising diffusion models. arXiv preprint arXiv:2106.07582, 2021.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents, 2022. URL https://arxiv. org/abs/2204.06125, 7, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022a.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022b.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189, 2023.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- Sweeting, T. On a converse to scheffé’s theorem. The Annals of Statistics, 14(3):1252–1256, 1986.
- Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pp. 3084–3114. PMLR, 2019.
- Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
- Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
- Poisson flow generative models. Advances in Neural Information Processing Systems, 35:16782–16795, 2022.
- Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. arXiv preprint arXiv:2302.04867, 2023.
- Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, pp. 42390–42402. PMLR, 2023.