Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploration of Visual Features and their weighted-additive fusion for Video Captioning (2101.05806v1)

Published 14 Jan 2021 in cs.CV

Abstract: Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning model that incorporates memory in a transformer encoder and uses a novel method, to fuse features, that ensures due importance is given to more significant representations. We illustrate a gain in performance realized by applying Word-Piece Tokenization and a popular REINFORCE algorithm. Finally, we benchmark our model on two datasets and obtain a CIDEr of 92.4 on MSVD and a METEOR of 0.091 on the ActivityNet Captions Dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Praveen S V (2 papers)
  2. Akhilesh Bharadwaj (1 paper)
  3. Harsh Raj (10 papers)
  4. Janhavi Dadhania (2 papers)
  5. Ganesh Samarth C. A (2 papers)
  6. Nikhil Pareek (2 papers)
  7. S R M Prasanna (4 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.