Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis (2309.09553v4)

Published 18 Sep 2023 in cs.CV, cs.AI, and cs.MM

Abstract: The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianyi Song (1 paper)
  2. Jiuxin Cao (18 papers)
  3. Kun Wang (355 papers)
  4. Bo Liu (484 papers)
  5. Xiaofeng Zhang (63 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.