Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy (2505.21036v2)

Published 27 May 2025 in cs.CV and cs.AI

Abstract: Video generation using diffusion models is highly computationally intensive, with 3D attention in Diffusion Transformer (DiT) models accounting for over 80\% of the total computational resources. In this work, we introduce {\bf RainFusion}, a novel training-free sparse attention method that exploits inherent sparsity nature in visual data to accelerate attention computation while preserving video quality. Specifically, we identify three unique sparse patterns in video generation attention calculations--Spatial Pattern, Temporal Pattern and Textural Pattern. The sparse pattern for each attention head is determined online with negligible overhead (\textasciitilde\,0.2\%) with our proposed {\bf ARM} (Adaptive Recognition Module) during inference. Our proposed {\bf RainFusion} is a plug-and-play method, that can be seamlessly integrated into state-of-the-art 3D-attention video generation models without additional training or calibration. We evaluate our method on leading open-sourced models including HunyuanVideo, OpenSoraPlan-1.2 and CogVideoX-5B, demonstrating its broad applicability and effectiveness. Experimental results show that RainFusion achieves over {\bf 2(\times)} speedup in attention computation while maintaining video quality, with only a minimal impact on VBench scores (-0.2\%).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Aiyue Chen (2 papers)
  2. Bin Dong (111 papers)
  3. Jingru Li (5 papers)
  4. Jing Lin (52 papers)
  5. Yiwu Yao (11 papers)
  6. Gongyi Wang (5 papers)
  7. Kun Tian (19 papers)

Summary

We haven't generated a summary for this paper yet.