DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models (2402.19481v4)

Published 29 Feb 2024 in cs.CV

Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a GPU. However, naively implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step. Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1$\times$ speedup on eight NVIDIA A100s compared to one. Our code is publicly available at https://github.com/mit-han-lab/distrifuser.

References (79)

Citations (27)

View on Semantic Scholar

Summary

The paper introduces DistriFusion, a novel multi-GPU parallel inference method that uses displaced patch parallelism to reduce latency in high-resolution diffusion models.
The paper demonstrates up to 6.1x speedup on 8 NVIDIA A100 GPUs, achieving efficient synthesis of high-quality images up to 3840×3840 pixels.
The paper refines diffusion model acceleration with sparse operations and corrected asynchronous GroupNorm, ensuring minimal quality loss during asynchronous computations.

Accelerating High-Resolution Diffusion Models with DistriFusion: A Multi-GPU Parallel Inference Approach

Introduction to DistriFusion

The development and deployment of Diffusion models for synthesizing high-quality images have been remarkable achievements within the field of AI-generated content (AIGC). These models are central to various applications, enabling the generation of photorealistic images from textual descriptions. Despite their success, one of the primary obstacles faced by current diffusion models is the significant computational cost associated with the generation of high-resolution images, which limits usability for interactive applications. Addressing this challenge, we introduce DistriFusion, a novel method designed to reduce the latency of generating high-resolution images by leveraging parallelism across multiple GPUs.

Problem Statement

Generating high-resolution images using diffusion models involves substantial computational costs, making real-time applications virtually unfeasible. Current acceleration efforts either focus on reducing the number of sampling steps or optimizing neural network inferences, both of which have limitations. Specifically, when aiming to utilize multiple GPUs, existing methods either incur significant communication overhead or fail to utilize GPU resources efficiently, making them unsuitable for accelerating single-sample generation.

DistriFusion Approach

DistriFusion encapsulates our proposed solution, employing distributed parallel inference to tackle the computational hurdles of diffusion models. The cornerstone of DistriFusion is the innovative use of displaced patch parallelism, resting on the observation that inputs across adjacent denoising steps exhibit high similarity. This approach enables asynchronous communication that can be pipelined by computation, markedly reducing latency without compromising image quality.

Key Features of DistriFusion include:

Patch Parallelism: By dividing the model input into multiple patches and assigning each patch to a different GPU, DistriFusion allows for parallel operations across devices.
Activation Displacement: Utilizing slightly outdated, or "stale," activations from previous steps to facilitate inter-patch interactions, thereby minimizing the need for real-time communication between GPUs.
Sparse Operations and Corrected Asynchronous GroupNorm: To further optimize performance, DistriFusion modifies the operation of convolutional, linear, and attention layers to operate selectively on fresh areas of each patch. It also introduces a correction term for stale GroupNorm statistics, mitigating the degradation of image quality due to asynchronous operations.

Experimental Results

DistriFusion was evaluated using the Stable Diffusion XL model across various settings. The method demonstrated the capability to generate high-quality images with no observable degradation in visual fidelity compared to the original model. Notably, DistriFusion achieved speedups of up to 6.1x on 8 NVIDIA A100 GPUs compared to single-GPU operation. Furthermore, when tested on high-resolution image synthesis (up to 3840×3840 pixels), it maintained considerable speed improvements, showcasing its scalability and efficiency.

Practical Implications and Future Directions

With its robust performance, DistriFusion presents a significant advancement in the field of AI-generated content, particularly in applications demanding high-resolution image outputs. Its ability to substantially reduce the time required for image synthesis without affecting quality makes it a promising tool for real-time interactive applications, such as advanced image editing and video generation platforms.

Looking ahead, further exploration into methods for reducing communication overhead and enhancing device utilization could yield even greater efficiencies. Additionally, exploring the integration of advanced compilation techniques and expanding support for an even broader range of diffusion models and applications represent promising avenues for future research.

Conclusion

DistriFusion represents a significant step forward in addressing the computational challenges of high-resolution image generation with diffusion models. By harnessing the power of multi-GPU parallelism and introducing specialized optimizations, it opens new possibilities for the creation and interactive manipulation of AI-generated content, pushing the boundaries of what is achievable in the field.

PDF Markdown

Related Papers

Tweets

https://twitter.com/lmxyy1999/status/1777758473583026503

https://twitter.com/jiayq/status/1763594527934431584

https://twitter.com/lmxyy1999/status/1763423756817743946

https://twitter.com/gm8xx8/status/1764483958782177681

https://twitter.com/javaeeeee1/status/1763543250093146353

https://twitter.com/MLexpAI/status/1763971977608593606

YouTube

Show All Videos