Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Text2Story: Advancing Video Storytelling with Text Guidance (2503.06310v1)

Published 8 Mar 2025 in cs.CV

Abstract: Generating coherent long-form video sequences from discrete input using only text prompts is a critical task in content creation. While diffusion-based models excel at short video synthesis, long-form storytelling from text remains largely unexplored and a challenge due to challenges pertaining to temporal coherency, preserving semantic meaning and action continuity across the video. We introduce a novel storytelling approach to enable seamless video generation with natural action transitions and structured narratives. We present a bidirectional time-weighted latent blending strategy to ensure temporal consistency between segments of the long-form video being generated. Further, our method extends the Black-Scholes algorithm from prompt mixing for image generation to video generation, enabling controlled motion evolution through structured text conditioning. To further enhance motion continuity, we propose a semantic action representation framework to encode high-level action semantics into the blending process, dynamically adjusting transitions based on action similarity, ensuring smooth yet adaptable motion changes. Latent space blending maintains spatial coherence between objects in a scene, while time-weighted blending enforces bidirectional constraints for temporal consistency. This integrative approach prevents abrupt transitions while ensuring fluid storytelling. Extensive experiments demonstrate significant improvements over baselines, achieving temporally consistent and visually compelling video narratives without any additional training. Our approach bridges the gap between short clips and extended video to establish a new paradigm in GenAI-driven video synthesis from text.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.