Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Hierarchical Approach for Visual Storytelling Using Image Description (1909.12401v1)

Published 26 Sep 2019 in cs.CV, cs.CL, cs.LG, and stat.ML

Abstract: One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Md Sultan Al Nahian (10 papers)
  2. Tasmia Tasrin (4 papers)
  3. Sagar Gandhi (5 papers)
  4. Ryan Gaines (1 paper)
  5. Brent Harrison (30 papers)
Citations (10)