S$^{2}$-DMs:Skip-Step Diffusion Models (2401.01520v2)
Abstract: Diffusion models have emerged as powerful generative tools, rivaling GANs in sample quality and mirroring the likelihood scores of autoregressive models. A subset of these models, exemplified by DDIMs, exhibit an inherent asymmetry: they are trained over $T$ steps but only sample from a subset of $T$ during generation. This selective sampling approach, though optimized for speed, inadvertently misses out on vital information from the unsampled steps, leading to potential compromises in sample quality. To address this issue, we present the S${2}$-DMs, which is a new training method by using an innovative $L_{skip}$, meticulously designed to reintegrate the information omitted during the selective sampling phase. The benefits of this approach are manifold: it notably enhances sample quality, is exceptionally simple to implement, requires minimal code modifications, and is flexible enough to be compatible with various sampling algorithms. On the CIFAR10 dataset, models trained using our algorithm showed an improvement of 3.27% to 14.06% over models trained with traditional methods across various sampling algorithms (DDIMs, PNDMs, DEIS) and different numbers of sampling steps (10, 20, ..., 1000). On the CELEBA dataset, the improvement ranged from 8.97% to 27.08%. Access to the code and additional resources is provided in the github.
- Banach wasserstein gan. Advances in neural information processing systems, 31, 2018.
- Deep generative stochastic networks trainable by backprop. In International Conference on Machine Learning, pp. 226–234. PMLR, 2014.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
- Maskgan: better text generation via filling in the_. arXiv preprint arXiv:1801.07736, 2018.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Hinton, G. E. A practical guide to training restricted boltzmann machines. In Neural Networks: Tricks of the Trade: Second Edition, pp. 599–619. Springer, 2012.
- Boundary-seeking generative adversarial networks. arXiv preprint arXiv:1702.08431, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Adversarial score matching and improved sampling for image generation. arXiv preprint arXiv:2009.05475, 2020.
- Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080, 2021.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410, 2019.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8110–8119, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Learning multiple layers of features from tiny images. 2009.
- Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022a.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022b.
- Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- Conditional image generation with pixelcnn decoders. Advances in neural information processing systems, 29, 2016.
- Vincent, P. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2021.
- Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
- Yixuan Wang (95 papers)
- Shuangyin Li (14 papers)