Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guiding Long-Short Term Memory for Image Caption Generation (1509.04942v1)

Published 16 Sep 2015 in cs.CV

Abstract: In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent from favoring short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or even outperform the current state-of-the-art.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xu Jia (57 papers)
  2. Efstratios Gavves (101 papers)
  3. Basura Fernando (60 papers)
  4. Tinne Tuytelaars (150 papers)
Citations (101)

Summary

We haven't generated a summary for this paper yet.