Papers
Topics
Authors
Recent
2000 character limit reached

Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism

Published 8 Dec 2025 in cs.DC | (2512.07350v1)

Abstract: Video diffusion models (VDMs) perform attention computation over the 3D spatio-temporal domain. Compared to LLMs processing 1D sequences, their memory consumption scales cubically, necessitating parallel serving across multiple GPUs. Traditional parallelism strategies partition the computational graph, requiring frequent high-dimensional activation transfers that create severe communication bottlenecks. To tackle this issue, we exploit the local spatio-temporal dependencies inherent in the diffusion denoising process and propose Latent Parallelism (LP), the first parallelism strategy tailored for VDM serving. \textcolor{black}{LP decomposes the global denoising problem into parallelizable sub-problems by dynamically rotating the partitioning dimensions (temporal, height, and width) within the compact latent space across diffusion timesteps, substantially reducing the communication overhead compared to prevailing parallelism strategies.} To ensure generation quality, we design a patch-aligned overlapping partition strategy that matches partition boundaries with visual patches and a position-aware latent reconstruction mechanism for smooth stitching. Experiments on three benchmarks demonstrate that LP reduces communication overhead by up to 97\% over baseline methods while maintaining comparable generation quality. As a non-intrusive plug-in paradigm, LP can be seamlessly integrated with existing parallelism strategies, enabling efficient and scalable video generation services.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.