T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching (2402.14167v1)
Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch
- ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. CoRR, abs/2211.01324, 2022.
- Revisiting model stitching to compare neural representations. In NeurIPS, pp. 225–236, 2021.
- All are worth words: A vit backbone for diffusion models. In CVPR, 2023.
- Token merging: Your vit but faster. In ICLR, 2023.
- Similarity and matching of neural network representations. In NeurIPS, pp. 5656–5668, 2021.
- Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
- Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023.
- Clipscore: A reference-free evaluation metric for image captioning. In EMNLP, pp. 7514–7528, 2021.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pp. 6626–6637, 2017.
- Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
- Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
- Gotta go fast when generating data with score-based models. CoRR, abs/2105.14080, 2021.
- Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
- Bk-sdm: Architecturally compressed stable diffusion for efficient text-to-image generation. ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo), 2023.
- Streamdiffusion: A pipeline-level solution for real-time interactive generation. arXiv, 2023.
- Diffwave: A versatile diffusion model for audio synthesis. In ICLR. OpenReview.net, 2021.
- Multi-architecture multi-expert diffusion models. CoRR, abs/2306.04990, 2023.
- Understanding image representations by measuring their equivariance and equivalence. In CVPR, pp. 991–999, 2015.
- Fast inference from transformers via speculative decoding. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), ICML, volume 202, pp. 19274–19286, 2023.
- Faster diffusion: Rethinking the role of unet encoder in diffusion models. arXiv, 2023a.
- Q-diffusion: Quantizing diffusion models. ICCV, 2023b.
- Pseudo numerical methods for diffusion models on manifolds. In ICLR, 2022.
- Dpm-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
- Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023.
- Deepcache: Accelerating diffusion models for free. arXiv, 2023.
- Stitchable neural networks. In CVPR, 2023a.
- Stitched vits are flexible vision backbones. arXiv, 2023b.
- Scalable diffusion models with transformers. CoRR, abs/2212.09748, 2022.
- SDXL: improving latent diffusion models for high-resolution image synthesis. CoRR, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR. OpenReview.net, 2023.
- Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion. In EMNLP Demos, pp. 286–295, 2023.
- On linear identifiability of learned representations. In ICML, volume 139, pp. 9030–9039, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, pp. 10684–10695, 2022.
- Progressive distillation for fast sampling of diffusion models. In ICLR. OpenReview.net, 2022.
- Adversarial diffusion distillation. arXiv, 2023.
- Segmind. Segmind Stable Diffusion Model (SSD-1B). https://huggingface.co/segmind/SSD-1B, 2023.
- Post-training quantization on diffusion models. CVPR, Jun 2023.
- Parallel sampling of diffusion models. CoRR, abs/2305.16317, 2023.
- Denoising diffusion implicit models. In ICLR. OpenReview.net, 2021a.
- Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
- Consistency models. In ICML, volume 202, 2023.
- Rethinking the inception architecture for computer vision. In CVPR, pp. 2818–2826. IEEE Computer Society, 2016.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Cache me if you can: Accelerating diffusion models through block caching. arXiv, 2023.
- Deep model reassembly. NeurIPS, 2022.
- Diffusion probabilistic model made slim. In CVPR, pp. 22552–22562. IEEE, 2023.
- Adding conditional control to text-to-image diffusion models. In ICCV, pp. 3836–3847, 2023.
- Mobilediffusion: Subsecond text-to-image generation on mobile devices. arXiv, 2023.
- Fast sampling of diffusion models via operator learning. In ICML, pp. 42390–42402. PMLR, 2023.