DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models (2211.01095v2)

Published 2 Nov 2022 in cs.LG and cs.CV

Abstract: Diffusion probabilistic models (DPMs) have achieved impressive success in high-resolution image synthesis, especially in recent large-scale text-to-image generation applications. An essential technique for improving the sample quality of DPMs is guided sampling, which usually needs a large guidance scale to obtain the best sample quality. The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples. Although recent works propose dedicated high-order solvers and achieve a further speedup for sampling without guidance, their effectiveness for guided sampling has not been well-tested before. In this work, we demonstrate that previous high-order fast samplers suffer from instability issues, and they even become slower than DDIM when the guidance scale grows large. To further speed up guided sampling, we propose DPM-Solver++, a high-order solver for the guided sampling of DPMs. DPM-Solver++ solves the diffusion ODE with the data prediction model and adopts thresholding methods to keep the solution matches training data distribution. We further propose a multistep variant of DPM-Solver++ to address the instability issue by reducing the effective step size. Experiments show that DPM-Solver++ can generate high-quality samples within only 15 to 20 steps for guided sampling by pixel-space and latent-space DPMs.

Authors (6)

Cheng Lu (70 papers)
Yuhao Zhou (78 papers)
Fan Bao (30 papers)
Jianfei Chen (63 papers)
Chongxuan Li (75 papers)
Jun Zhu (424 papers)

Citations (426)

View on Semantic Scholar

Summary

DPM-Solver++: Accelerating Guided Sampling for Diffusion Probabilistic Models

The paper "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models" addresses a significant challenge in the efficiency of guided sampling within the framework of diffusion probabilistic models (DPMs). DPMs have demonstrated substantial success in generating high-resolution images, particularly in the domain of text-to-image synthesis. However, the inherent inefficiency of existing guided sampling methods, which often require extensive computational resources, presents a notable bottleneck. This research proposes DPM-Solver++, a novel high-order solver aiming to improve the speed and quality of guided sampling processes in diffusion models.

Problem Statement

Guided sampling is a pivotal technique enhancing DPMs' sample quality by employing external guidance, often through classifier-free methods or explicit classifier involvement. However, prevalent methods such as DDIM, a first-order solver, necessitate between 100 to 250 function evaluations to yield high-quality samples. This limitation hinders the practical application of DPMs, particularly in scenarios demanding fast and efficient generation.

Methodological Contributions

Instability in High-Order Solvers: Initial analysis in the paper reveals instability issues in existing high-order solvers when applied to guided sampling with large guidance scales. These solvers often perform worse than first-order methods, such as DDIM, due to their reduced convergence radius in high-guidance settings and mismatch between training and test distributions.
DPM-Solver++ Development: The core innovation is the introduction of DPM-Solver++, an advanced high-order solver constructed to address the documented shortcomings. By leveraging a data prediction model parameterization, DPM-Solver++ effectively solves diffusion ODEs and improves stability. This approach allows for the straightforward integration of dynamic thresholding methods to counter the "train-test mismatch" issue.
Multistep Variant: To address numerical stability concerns associated with large guidance scales, a multistep variant of DPM-Solver++ is devised. This version reduces the effective step size, further enhancing the robustness and efficiency of the sampling process.

Experimental Evaluation

Empirical results are robust, demonstrating that DPM-Solver++ can approximate high-quality samples in a mere 15 to 20 steps. This marks a significant reduction in computational demand compared to traditional methods. It consistently outperforms existing approaches in both pixel-space and latent-space domains, across varying degrees of guidance scale. Additionally, ablation studies validate the necessity and effectiveness of each component within DPM-Solver++.

Implications and Future Direction

This research provides a practical solution to the computation-intensive nature of guided sampling in DPMs, greatly enhancing their utility in real-time applications. The introduction of a multistep solver variant suggests new avenues for further reduction in computational overheads. Future work could explore integrating these solvers with an expanded range of guidance models or extending their application into domains beyond image synthesis, such as natural language processing or voice synthesis.

In conclusion, DPM-Solver++ represents an important advancement in optimizing the guided sampling process for diffusion models. It achieves a balance between computational efficiency and sample fidelity, poised to significantly impact the scalability and adoption of diffusion models in commercial and research applications.

Related Papers

Find Related Papers

Tweets

https://twitter.com/permutans/status/1744357260321530193

https://twitter.com/jkumarsharma998/status/1838929959210905787