Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching (2402.14167v1)

Published 21 Feb 2024 in cs.CV and cs.LG

Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. CoRR, abs/2211.01324, 2022.
  2. Revisiting model stitching to compare neural representations. In NeurIPS, pp.  225–236, 2021.
  3. All are worth words: A vit backbone for diffusion models. In CVPR, 2023.
  4. Token merging: Your vit but faster. In ICLR, 2023.
  5. Similarity and matching of neural network representations. In NeurIPS, pp.  5656–5668, 2021.
  6. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  7. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023.
  8. Clipscore: A reference-free evaluation metric for image captioning. In EMNLP, pp.  7514–7528, 2021.
  9. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pp.  6626–6637, 2017.
  10. Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
  11. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  12. Gotta go fast when generating data with score-based models. CoRR, abs/2105.14080, 2021.
  13. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
  14. Bk-sdm: Architecturally compressed stable diffusion for efficient text-to-image generation. ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo), 2023.
  15. Streamdiffusion: A pipeline-level solution for real-time interactive generation. arXiv, 2023.
  16. Diffwave: A versatile diffusion model for audio synthesis. In ICLR. OpenReview.net, 2021.
  17. Multi-architecture multi-expert diffusion models. CoRR, abs/2306.04990, 2023.
  18. Understanding image representations by measuring their equivariance and equivalence. In CVPR, pp.  991–999, 2015.
  19. Fast inference from transformers via speculative decoding. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), ICML, volume 202, pp.  19274–19286, 2023.
  20. Faster diffusion: Rethinking the role of unet encoder in diffusion models. arXiv, 2023a.
  21. Q-diffusion: Quantizing diffusion models. ICCV, 2023b.
  22. Pseudo numerical methods for diffusion models on manifolds. In ICLR, 2022.
  23. Dpm-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
  24. Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023.
  25. Deepcache: Accelerating diffusion models for free. arXiv, 2023.
  26. Stitchable neural networks. In CVPR, 2023a.
  27. Stitched vits are flexible vision backbones. arXiv, 2023b.
  28. Scalable diffusion models with transformers. CoRR, abs/2212.09748, 2022.
  29. SDXL: improving latent diffusion models for high-resolution image synthesis. CoRR, 2023.
  30. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR. OpenReview.net, 2023.
  31. Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion. In EMNLP Demos, pp.  286–295, 2023.
  32. On linear identifiability of learned representations. In ICML, volume 139, pp.  9030–9039, 2021.
  33. High-resolution image synthesis with latent diffusion models. In CVPR, pp.  10684–10695, 2022.
  34. Progressive distillation for fast sampling of diffusion models. In ICLR. OpenReview.net, 2022.
  35. Adversarial diffusion distillation. arXiv, 2023.
  36. Segmind. Segmind Stable Diffusion Model (SSD-1B). https://huggingface.co/segmind/SSD-1B, 2023.
  37. Post-training quantization on diffusion models. CVPR, Jun 2023.
  38. Parallel sampling of diffusion models. CoRR, abs/2305.16317, 2023.
  39. Denoising diffusion implicit models. In ICLR. OpenReview.net, 2021a.
  40. Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
  41. Consistency models. In ICML, volume 202, 2023.
  42. Rethinking the inception architecture for computer vision. In CVPR, pp.  2818–2826. IEEE Computer Society, 2016.
  43. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  44. Cache me if you can: Accelerating diffusion models through block caching. arXiv, 2023.
  45. Deep model reassembly. NeurIPS, 2022.
  46. Diffusion probabilistic model made slim. In CVPR, pp.  22552–22562. IEEE, 2023.
  47. Adding conditional control to text-to-image diffusion models. In ICCV, pp.  3836–3847, 2023.
  48. Mobilediffusion: Subsecond text-to-image generation on mobile devices. arXiv, 2023.
  49. Fast sampling of diffusion models via operator learning. In ICML, pp.  42390–42402. PMLR, 2023.
Citations (12)

Summary

  • The paper introduces T-Stitch, a trajectory stitching method that combines small and large diffusion models to accelerate sampling efficiency.
  • The methodology leverages similar latent spaces and a focus on low-frequency denoising in early steps for efficient computational resource allocation.
  • Experiments show that using a smaller model for up to 40% of the early steps speeds up sampling with negligible impact on image quality and enhanced prompt alignment.

Accelerating Diffusion Model Sampling with Trajectory Stitching

Introduction to T-Stitch

Diffusion probabilistic models (DPMs) have gained significant attention for their efficacy in generating high-quality images across various applications. However, the computational cost associated with sampling from these models, especially the large ones, poses a notable challenge. In response, this paper introduces a novel technique named Trajectory Stitching (T-Stitch) designed to enhance the sampling efficiency of pre-trained DPMs without compromising generation quality significantly. T-Stitch ingeniously combines smaller and larger diffusion models in a single sampling trajectory, capitalizing on the former for the initial steps and transitioning to the latter in subsequent steps. The approach is validated across multiple architectures and datasets, promising efficient use of computational resources in generative models without necessitating re-training.

Methodology and Key Insights

The T-Stitch technique is grounded on two pivotal insights:

  1. Similar Latent Spaces: Different DPMs trained on the same dataset exhibit similar latent encodings, allowing for seamless stitching between models of various sizes.
  2. Frequency Components: Denoising processes in DPMs focus on low-frequency components in early steps, where smaller models are particularly efficient, before transitioning to high-frequency details in later steps that demand the capabilities of larger models.

T-Stitch implements a dynamic and efficient allocation of computational resources across the denoising steps. Initial phases employ a smaller, computationally inexpensive model—a strategy that leverages early denoising steps' focus on broader image structures rather than intricate details. This transition to a larger model in later steps ensures the high-frequency details necessary for high-quality image generation are well captured.

Experimental Validation and Results

The T-Stitch technique was subjected to rigorous testing across various model architectures and samplers. Results confirmed that deploying a smaller model in up to 40% of the early steps could significantly speed up the generation process with negligible impact on image quality, as measured by the Fidelity and Diversity scores. These outcomes were consistent regardless of the underlying model architecture or the application domain, from standard image synthesis on datasets like ImageNet to stylized text-to-image generation. Remarkably, T-Stitch not only boosted efficiency but also enhanced the prompt alignment in text-to-image tasks, contributing to more accurate and contextually aligned image outputs.

The Practical Implications and future Outlook

T-Stitch's ability to accelerate the sampling process while maintaining generation quality has profound implications for the adoption and application of diffusion models in resource-constrained settings. Its compatibility with existing fast sampling methods and the ability to apply it to a wide range of pre-trained models without additional training requirements significantly broadens its applicability. Furthermore, T-Stitch introduces a promising direction for future research focused on optimizing the interplay between small and large models in the sampling trajectory—potentially allowing for even greater efficiency gains and quality enhancements.

Conclusions and Considerations

In summary, T-Stitch represents a significant advancement in enhancing the efficiency of DPM sampling. By harmoniously integrating smaller and larger diffusion models, it offers a robust strategy for managing the trade-offs between computational efficiency and generation quality. Although limitations exist, including the need for a pre-trained smaller model and a slight increase in memory requirements, the benefits, including reduced computational costs and enhanced generation capabilities, are compelling. The approach not only paves the way for more sustainable AI practices but also expands the feasibility of deploying high-quality generative models in diverse real-world scenarios.

X Twitter Logo Streamline Icon: https://streamlinehq.com