Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bornon: Bengali Image Captioning with Transformer-based Deep learning approach (2109.05218v1)

Published 11 Sep 2021 in cs.CV

Abstract: Image captioning using Encoder-Decoder based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback that is sequence needs to be processed in order. To overcome this drawback some researcher has utilized the Transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the Transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based Encoder-Decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Faisal Muhammad Shah (14 papers)
  2. Mayeesha Humaira (2 papers)
  3. Md Abidur Rahman Khan Jim (1 paper)
  4. Amit Saha Ami (1 paper)
  5. Shimul Paul (1 paper)
Citations (16)