Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A sequential guiding network with attention for image captioning (1811.00228v3)

Published 1 Nov 2018 in cs.CV and cs.CL

Abstract: The recent advances of deep learning in both computer vision (CV) and NLP provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Daouda Sow (8 papers)
  2. Zengchang Qin (29 papers)
  3. Mouhamed Niasse (1 paper)
  4. Tao Wan (12 papers)
Citations (3)