Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation (2310.07252v1)

Published 11 Oct 2023 in cs.CV and cs.LG

Abstract: Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based LLM as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Rashid Khan (8 papers)
  2. Bingding Huang (7 papers)
  3. Haseeb Hassan (5 papers)
  4. Asim Zaman (2 papers)
  5. Zhongfu Ye (4 papers)
Citations (1)