Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Review of Deep Learning for Video Captioning (2304.11431v1)

Published 22 Apr 2023 in cs.CV

Abstract: Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, NLP, linguistics, and human-computer interaction. In essence, VC involves understanding a video and describing it with language. Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigation) to video question answering (V-QA), video retrieval and content generation. This survey covers deep learning-based VC, including but, not limited to, attention-based architectures, graph networks, reinforcement learning, adversarial networks, dense video captioning (DVC), and more. We discuss the datasets and evaluation metrics used in the field, and limitations, applications, challenges, and future directions for VC.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Moloud Abdar (17 papers)
  2. Meenakshi Kollati (1 paper)
  3. Swaraja Kuraparthi (1 paper)
  4. Farhad Pourpanah (14 papers)
  5. Daniel McDuff (88 papers)
  6. Mohammad Ghavamzadeh (97 papers)
  7. Shuicheng Yan (275 papers)
  8. Abduallah Mohamed (10 papers)
  9. Abbas Khosravi (43 papers)
  10. Erik Cambria (136 papers)
  11. Fatih Porikli (141 papers)
Citations (17)