Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning (1903.05942v4)

Published 14 Mar 2019 in cs.CV and cs.CL

Abstract: Our goal in this work is to train an image captioning model that generates more dense and informative captions. We introduce "relational captioning," a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in an image. Relational captioning is a framework that is advantageous in both diversity and amount of information, leading to image understanding based on relationships. Part-of speech (POS, i.e. subject-object-predicate categories) tags can be assigned to every English word. We leverage the POS as a prior to guide the correct sequence of words in a caption. To this end, we propose a multi-task triple-stream network (MTTSNet) which consists of three recurrent units for the respective POS and jointly performs POS prediction and captioning. We demonstrate more diverse and richer representations generated by the proposed model against several baselines and competing methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Dong-Jin Kim (167 papers)
  2. Jinsoo Choi (13 papers)
  3. Tae-Hyun Oh (75 papers)
  4. In So Kweon (156 papers)
Citations (78)