Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Album Storytelling with Iterative Story-aware Captioning and Large Language Models (2305.12943v2)

Published 22 May 2023 in cs.CV

Abstract: This work studies how to transform an album to vivid and coherent stories, a task we refer to as "album storytelling". While this task can help preserve memories and facilitate experience sharing, it remains an underexplored area in current literature. With recent advances in LLMs, it is now possible to generate lengthy, coherent text, opening up the opportunity to develop an AI assistant for album storytelling. One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story. However, we find this often results in stories containing hallucinated information that contradicts the images, as each generated caption ("story-agnostic") is not always about the description related to the whole story or miss some necessary information. To address these limitations, we propose a new iterative album storytelling pipeline. Specifically, we start with an initial story and build a story-aware caption model to refine the captions using the whole story as guidance. The polished captions are then fed into the LLMs to generate a new refined story. This process is repeated iteratively until the story contains minimal factual errors while maintaining coherence. To evaluate our proposed pipeline, we introduce a new dataset of image collections from vlogs and a set of systematic evaluation metrics. Our results demonstrate that our method effectively generates more accurate and engaging stories for albums, with enhanced coherence and vividness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Munan Ning (19 papers)
  2. Yujia Xie (29 papers)
  3. Dongdong Chen (164 papers)
  4. Zeyin Song (4 papers)
  5. Lu Yuan (130 papers)
  6. Yonghong Tian (184 papers)
  7. Qixiang Ye (110 papers)
  8. Li Yuan (141 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.