Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Grounded Objects and Interactions for Video Captioning (1711.06354v1)

Published 16 Nov 2017 in cs.CV

Abstract: We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the problem of video understanding. In this paper, we propose SINet-Caption that learns to generate captions grounded over higher-order interactions between arbitrary groups of objects for fine-grained video understanding. We discuss the challenges and benefits of such an approach. We further demonstrate state-of-the-art results on the ActivityNet Captions dataset using our model, SINet-Caption based on this approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chih-Yao Ma (27 papers)
  2. Asim Kadav (22 papers)
  3. Iain Melvin (3 papers)
  4. Zsolt Kira (110 papers)
  5. Ghassan AlRegib (126 papers)
  6. Hans Peter Graf (9 papers)
Citations (6)