Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Entity-aware Image Caption Generation (1804.07889v2)

Published 21 Apr 2018 in cs.CL

Abstract: Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Di Lu (37 papers)
  2. Spencer Whitehead (18 papers)
  3. Lifu Huang (92 papers)
  4. Heng Ji (266 papers)
  5. Shih-Fu Chang (131 papers)
Citations (79)