Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation (1805.08191v3)

Published 21 May 2018 in cs.CV, cs.AI, cs.LG, and cs.NE

Abstract: We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qiuyuan Huang (23 papers)
  2. Zhe Gan (135 papers)
  3. Asli Celikyilmaz (80 papers)
  4. Dapeng Wu (52 papers)
  5. Jianfeng Wang (149 papers)
  6. Xiaodong He (162 papers)
Citations (91)