Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
89 tokens/sec
Gemini 2.5 Pro Premium
41 tokens/sec
GPT-5 Medium
23 tokens/sec
GPT-5 High Premium
19 tokens/sec
GPT-4o
96 tokens/sec
DeepSeek R1 via Azure Premium
88 tokens/sec
GPT OSS 120B via Groq Premium
467 tokens/sec
Kimi K2 via Groq Premium
197 tokens/sec
2000 character limit reached

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion (2501.09019v1)

Published 15 Jan 2025 in cs.CV

Abstract: The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressively increasing noise, continuously producing clean frames at the queue's head while Gaussian noise is enqueued at the tail. However, FIFO-Diffusion often struggles to keep long-range temporal consistency in the generated videos due to the lack of correspondence modeling across frames. In this paper, we propose Ouroboros-Diffusion, a novel video denoising framework designed to enhance structural and content (subject) consistency, enabling the generation of consistent videos of arbitrary length. Specifically, we introduce a new latent sampling technique at the queue tail to improve structural consistency, ensuring perceptually smooth transitions among frames. To enhance subject consistency, we devise a Subject-Aware Cross-Frame Attention (SACFA) mechanism, which aligns subjects across frames within short segments to achieve better visual coherence. Furthermore, we introduce self-recurrent guidance. This technique leverages information from all previous cleaner frames at the front of the queue to guide the denoising of noisier frames at the end, fostering rich and contextual global information interaction. Extensive experiments of long video generation on the VBench benchmark demonstrate the superiority of our Ouroboros-Diffusion, particularly in terms of subject consistency, motion smoothness, and temporal consistency.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com