Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-Autoregressive Transformer for Image Captioning (2106.09436v2)

Published 17 Jun 2021 in cs.CV

Abstract: Current state-of-the-art image captioning models adopt autoregressive decoders, \ie they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. To tackle this issue, non-autoregressive image captioning models have recently been proposed to significantly accelerate the speed of inference by generating all words in parallel. However, these non-autoregressive models inevitably suffer from large generation quality degradation since they remove words dependence excessively. To make a better trade-off between speed and quality, we introduce a semi-autoregressive model for image captioning~(dubbed as SATIC), which keeps the autoregressive property in global but generates words parallelly in local . Based on Transformer, there are only a few modifications needed to implement SATIC. Experimental results on the MSCOCO image captioning benchmark show that SATIC can achieve a good trade-off without bells and whistles. Code is available at {\color{magenta}\url{https://github.com/YuanEZhou/satic}}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuanen Zhou (4 papers)
  2. Yong Zhang (660 papers)
  3. Zhenzhen Hu (16 papers)
  4. Meng Wang (1063 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub