T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching (2402.14167v1)

Published 21 Feb 2024 in cs.CV and cs.LG

Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch

References (49)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces T-Stitch, a trajectory stitching method that combines small and large diffusion models to accelerate sampling efficiency.
The methodology leverages similar latent spaces and a focus on low-frequency denoising in early steps for efficient computational resource allocation.
Experiments show that using a smaller model for up to 40% of the early steps speeds up sampling with negligible impact on image quality and enhanced prompt alignment.

Accelerating Diffusion Model Sampling with Trajectory Stitching

Introduction to T-Stitch

Diffusion probabilistic models (DPMs) have gained significant attention for their efficacy in generating high-quality images across various applications. However, the computational cost associated with sampling from these models, especially the large ones, poses a notable challenge. In response, this paper introduces a novel technique named Trajectory Stitching (T-Stitch) designed to enhance the sampling efficiency of pre-trained DPMs without compromising generation quality significantly. T-Stitch ingeniously combines smaller and larger diffusion models in a single sampling trajectory, capitalizing on the former for the initial steps and transitioning to the latter in subsequent steps. The approach is validated across multiple architectures and datasets, promising efficient use of computational resources in generative models without necessitating re-training.

Methodology and Key Insights

The T-Stitch technique is grounded on two pivotal insights:

Similar Latent Spaces: Different DPMs trained on the same dataset exhibit similar latent encodings, allowing for seamless stitching between models of various sizes.
Frequency Components: Denoising processes in DPMs focus on low-frequency components in early steps, where smaller models are particularly efficient, before transitioning to high-frequency details in later steps that demand the capabilities of larger models.

T-Stitch implements a dynamic and efficient allocation of computational resources across the denoising steps. Initial phases employ a smaller, computationally inexpensive model—a strategy that leverages early denoising steps' focus on broader image structures rather than intricate details. This transition to a larger model in later steps ensures the high-frequency details necessary for high-quality image generation are well captured.

Experimental Validation and Results

The T-Stitch technique was subjected to rigorous testing across various model architectures and samplers. Results confirmed that deploying a smaller model in up to 40% of the early steps could significantly speed up the generation process with negligible impact on image quality, as measured by the Fidelity and Diversity scores. These outcomes were consistent regardless of the underlying model architecture or the application domain, from standard image synthesis on datasets like ImageNet to stylized text-to-image generation. Remarkably, T-Stitch not only boosted efficiency but also enhanced the prompt alignment in text-to-image tasks, contributing to more accurate and contextually aligned image outputs.

The Practical Implications and future Outlook

T-Stitch's ability to accelerate the sampling process while maintaining generation quality has profound implications for the adoption and application of diffusion models in resource-constrained settings. Its compatibility with existing fast sampling methods and the ability to apply it to a wide range of pre-trained models without additional training requirements significantly broadens its applicability. Furthermore, T-Stitch introduces a promising direction for future research focused on optimizing the interplay between small and large models in the sampling trajectory—potentially allowing for even greater efficiency gains and quality enhancements.

Conclusions and Considerations

In summary, T-Stitch represents a significant advancement in enhancing the efficiency of DPM sampling. By harmoniously integrating smaller and larger diffusion models, it offers a robust strategy for managing the trade-offs between computational efficiency and generation quality. Although limitations exist, including the need for a pre-trained smaller model and a slight increase in memory requirements, the benefits, including reduced computational costs and enhanced generation capabilities, are compelling. The approach not only paves the way for more sustainable AI practices but also expands the feasibility of deploying high-quality generative models in diverse real-world scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1760859244683735547